The three most important criteria by which to judge file or document classification and coding systems are Consistency Consistency & Consistency The reason is pretty obvious: without consistency a file classification scheme cannot deliver any of the promised downstream benefits, things like enhanced retrievability, selection of appropriate retention schedules, and setting appropriate security access permissions […]

Read More

Document images often have quality issues that make it difficult to extract text or data elements from them. For example: Forms can have lines running through much of the text. Watermarks can interfere with text recognition. Text orientation may be skewed. Once specific issues have been identified, advanced image enhancement techniques can greatly improve the quality and quantity […]

Read More

PDF standards enable users to embed or include non-visible metadata within PDFs as attribute name and attribute value pairs. This feature can be used to embed referential metadata normally stored and used external to the files to help find or otherwise work with them. Here are some reasons why embedding metadata values can be a […]

Read More

“Unstructured” content is a term used to describe content stored on file shares, personal computing devices, and content management systems. A major challenge to making effective use of such content is that words can have multiple meanings, and a name can refer to more than one person. Even worse, there can be multiple forms of […]

Read More

Negation is a powerful new tool used to identify high-value words or graphical elements in documents, detect patterns across document types, and add a new dimension to Boolean logic. The idea is simple: within clusters of visually-similar documents, the words and graphical elements differentiating one document from another are the ones that don’t occur in the same […]

Read More

Documents in file shares, content management systems, and scanned archives are often described as “unstructured.” However, there is typically a high level of structure in and interconnectedness among those documents. This structure and interconnectedness occurs because specific document types contain recurring attributes or data elements and those attributes or data elements are shared with other […]

Read More