Whether an organization is trying to consolidate individual information silos, incorporate content acquired by merger or acquisition, or permit federated search, the challenge is determining where is true north in terms of classifying content. That is, to find a way to first group or classify documents consistently regardless of any one person’s view or assessment of the documents and then to determine what label to apply to different groupings or clusters.

Consistency in Auto-Classification, reasons for seeking and obstacles to obtainingWithout consistent classification, there can be no governance, no reliable way to easily find things. Classification sheds needed light on otherwise dark data.

Once true north is determined, different business users in an organization can use different labels to point to the same content. Different classification schemes can be mapped to true north and hence to each other.

Visual Classification. Visual classification provides a way for organizations to find true north in their content. The system automatically groups documents or files based on their visual appearance in a completely objectivist fashion, i.e., without users defining rules, writing scripts, or selecting exemplars or seed sets.

Because the initial groupings are self-forming, they are not subject to personal biases of any individuals. Because the initial groupings are based on visual similarity not textual content, the groupings are not dependent on the availability of high-quality textual representations of documents – the groupings are comprehensive and consistent regardless of the file types storing the content.

SME Input. Examining the largest groupings first, subject matter experts can examine the clusters or groupings containing over 99% of the documents in an organization in less than a week. The ability to effectively examine representative documents of entire collections in short time frames means it is cost-effective to have teams of SMEs examine the same documents at the same time making sure that different perspectives are taken into consideration in developing document type taxonomies or trees.

Audit. Visual similarity can also be used to audit or check consistency in existing classified content. It can reveal where the same types of document are being classified differently. This can happen where classification schemes have been in place for a long time and people assuming responsibility for earlier schemes did not fully comprehend them and developed new classifications that were essentially redundant or duplicative.

Proof-of-Concept. If you’d like to see how well visual classification performs on your content, contact BeyondRecognition by emailing us at info@beyondrecognition.net or by submitting the online contact form. We would be happy to discuss processing a reasonable number of files for your review, typically about a terabyte.

Comments are closed.