The three most important criteria by which to judge file or document classification and coding systems are Consistency Consistency & Consistency The reason is pretty obvious: without consistency a file classification scheme cannot deliver any of the promised downstream benefits, things like enhanced retrievability, selection of appropriate retention schedules, and setting appropriate security access permissions […]

Read More

In simplest terms, information security involves identifying and protecting information that could somehow damage an organization legally or competitively if it were misused. Achieving those objectives in unstructured content is far easier if the organization first classifies documents by document type and evaluates the types and levels of risk associated with each type. Once that […]

Read More

“Unstructured” content is a term used to describe content stored on file shares, personal computing devices, and content management systems. A major challenge to making effective use of such content is that words can have multiple meanings, and a name can refer to more than one person. Even worse, there can be multiple forms of […]

Read More

Negation is a powerful new tool used to identify high-value words or graphical elements in documents, detect patterns across document types, and add a new dimension to Boolean logic. The idea is simple: within clusters of visually-similar documents, the words and graphical elements differentiating one document from another are the ones that don’t occur in the same […]

Read More

Documents in file shares, content management systems, and scanned archives are often described as “unstructured.” However, there is typically a high level of structure in and interconnectedness among those documents. This structure and interconnectedness occurs because specific document types contain recurring attributes or data elements and those attributes or data elements are shared with other […]

Read More

Because of the significant reputational and financial consequences of failing to protect content containing personally identifiable information (“PII”), corporations and governmental agencies have made it a major goal to identify and protect such content. Privacy expectations arise from a number of laws in different jurisdictions and are sometimes referred to by various acronyms such as […]

Read More

Writers in the information management space often speak of structured vs. unstructured data and then analyze documents as if they were “unstructured.” However, when documents are clustered by visual similarity, they are actually fairly structured within clusters, e.g., invoices, letters, and emails each have recurring attributes or data elements located in generally the same place […]

Read More