All document-related information governance initiatives rest on and depend upon consistent, comprehensive document classification. Without consistent, comprehensive classification, an organization can’t determine what to keep, how long to keep it, who should have access to it, and where to store it.
For that reason, large organizations look to “auto-classification” to obtain the needed consistency at the volume needed to keep pace with burgeoning document collections. However, just as automobiles have a wide range of operating characteristics, there is a wide range of operating characteristics for software solutions alleged to provide “auto-classification.”
Here are five questions to determine if the “auto” in any particular case has to be hand-cranked or pushed to get started and whether it can change directions or speed up as needed.
- What Type of Fuel Does It Require? Does the classification engine run only on premium racing fuel? Stated differently, does the classification engine require full and accurate textual representations of the document being classified? If so, note that full and accurate textual representations are not available for many documents encountered outside the vendor’s testing environment. Scanned documents, and files saved to PDF from many applications do not have full and accurate text. In some collections as many as half of the document files may not have associated text. A “comprehensive” solution runs on a wide range of “fuel.”
- How Much Assembly Required? Does it take an army of hourly-billing consultants in full “land-and-expand” mode to assemble the classification engine to where it can classify even a hundred document types? If classification depends upon consultants writing classification rules, selecting exemplars, or “tuning” results, the initial startup and operating expense will be significant.
- Street Legal or Professional Driver, Closed Course Only? What happens in the real world when things change, e.g., a new company is acquired and a whole new set of records need to be ingested and integrated, or a new federal regulation causes the re-writing of many form documents? Does the system choke up, leaving you with no practical way to absorb the new records?
- Speed and Range? Does the “auto” classification engine have the speed and range to cover the ground the enterprise covers? Or is it an interesting department-level sort of solution?
- What Accessories? Does the “auto” classification solution provide a full suite of related functionality or does the buyer have to shop for air conditioning, tires, and sound systems?
Write down the answers that pertain to your “auto” classification system and then compare them to the answers for visual classification:
- Fuel: Visual classification analyzes visual representations of document, basically normalizing them regardless of the type of file used to store and display them and enabling visual classification to consistently classify documents whether or not they have any text associated with them.
- Assembly: Visual classification clusters visually-similar documents automatically, there are not rules to write or exemplars to select. Client subject matter experts can immediately begin evaluating documents to determine which ones to keep, what to call them, and what attributes to extract from them.
- Street Legal. Visual classification alerts users when new clusters form, providing an alert for new content which is quickly ingested.
- Speed and Range. Visual classification scales to handle enterprise-wide collections from the world’s largest organizations.
- Accessories. BR’s functionality includes collection, evaluation, export, redaction, and “Find” a search functionality with fixed and relative positional operators, full range searching for dates, numbers, and words, and glyph or graphical element searching.
If you’d like to see how BeyondRecognition could put you in the driver’s seat for getting to your information governance target destination, contact IGDoneRight@BeyondRecognition.net or use the contact form beneath this posting.
For a simple explanation of visual classification, see https://www.beyondrecognition.net/technology or, for brief animations of collection, classification, document type labeling, and attribution, visit https://beyondrecognition.net/short-video-clips/