Most file auto-classification systems rely on the presence of accurate textual representations of the files being classified. Organizations that use those auto-classification systems need to be aware of several problems with a text-reliant approach: Ignoring Non-Textual Files. Many files have no text associated with them, e.g., files output as PDF or TIF files from user-software or captured as […]

Read More

The three most important criteria by which to judge file or document classification and coding systems are Consistency Consistency & Consistency The reason is pretty obvious: without consistency a file classification scheme cannot deliver any of the promised downstream benefits, things like enhanced retrievability, selection of appropriate retention schedules, and setting appropriate security access permissions […]

Read More

Gartner’s Market Guide for File Analysis Software, just released August 4, 2015, features BeyondRecognition along with other 25 other providers. BeyondRecognition is the only provider that classifies documents based on their visual appearance, not on a text-based analysis. The Guide, written by Alan Dayley, Debra Logan, Jie Zhang, and Garth Landers identifies the key use cases for […]

Read More

Whether an organization is trying to consolidate individual information silos, incorporate content acquired by merger or acquisition, or permit federated search, the challenge is determining where is true north in terms of classifying content. That is, to find a way to first group or classify documents consistently regardless of any one person’s view or assessment […]

Read More

In simplest terms, information security involves identifying and protecting information that could somehow damage an organization legally or competitively if it were misused. Achieving those objectives in unstructured content is far easier if the organization first classifies documents by document type and evaluates the types and levels of risk associated with each type. Once that […]

Read More

“Unstructured” content is a term used to describe content stored on file shares, personal computing devices, and content management systems. A major challenge to making effective use of such content is that words can have multiple meanings, and a name can refer to more than one person. Even worse, there can be multiple forms of […]

Read More

Text analytics does some remarkable things with what it’s able to see, but in one critical aspect it is a giant leap backwards to the days of telegraphs and stock ticker tapes when information was delivered on continuous strips of paper with just numbers, letters, and basic punctuation printed on them. In those days, the […]

Read More

All document-related information governance initiatives rest on and depend upon consistent, comprehensive document classification. Without consistent, comprehensive classification, an organization can’t determine what to keep, how long to keep it, who should have access to it, and where to store it. For that reason, large organizations look to “auto-classification” to obtain the needed consistency at the […]

Read More

Most articles and blog posts about information governance provide very little in the way of new insight about how to accomplish information governance in major corporations. They typically just embellish on what everyone already knows ought to be done, not how to do it. Some may provide further data points quantifying the problem (e.g., companies are accumulating data at […]

Read More

For many years information governance failed to achieve many of its goals primarily because document classification, which is the necessary first step in nearly all IG initiatives, has proven difficult to achieve on an enterprise scale. No more. 2015 marks the beginning of a whole new era. Background People involved in information governance have long known […]

Read More