Most of us have heard about the parable of the six blind men and the elephant – it may actually be the first recorded instance of faceted classification. Six blind men touched different parts of an elephant and each described a completely different thing based on their own perspective or “view” of the elephant: The one who felt a tusk reported it as a pipe, the one who felt an ear reported a fan, the belly was reported as a wall, the trunk as a branch, a leg was reported as a pillar and the tail was described as being a rope.
This story illustrates several important information governance lessons:
Different Stakeholders Have Different Views & Needs. People’s views of and information needs from any given corpus of documents will vary according to where they are in an organization and the functions they perform. People with different roles in a company will naturally be interested in different attributes of the documents in the corpus and may well use different descriptors when describing or trying to find some of them. While some document attributes are common to all stakeholders, others, namely those which enable an individual or group to perform their job function within an organization, are probably not.
Here is an example of how different roles will be interested in different attributes:
The Offshore Power Plant
- Stakeholders in the tax department need to know whether an expenditure on a sub-sea turbine, a critical component on a key project, can be categorized as an operating or capital expense in the jurisdiction where the project’s work is being performed.
- The plant maintenance department needs to know:
- When the warranty period kicks in.
- Part numbers, nomenclature and service level.
- Environmental Safety & Health wants to know that the team and all the contractors associated with the project are properly qualified and sanctioned to install the turbine to the engineering design specifications and operated within the tolerances.
- RIM and Compliance wants to track the locations of all relevant project documents for information lifecycle management, disposition and regulatory reasons.
- IT needs to ensure that business critical documents are backed up for disaster recovery and business continuity purposes.
- InfoSec wants to know that the IT group has the requisite information to ensure that the IP associated with the project is properly secured and that the people who access the content have the proper authorization to do so.
Some or all of the information required by the stakeholders above will be objectively evident on the face of individual documents. Other “subjective” attributes may have to be assigned (e.g., “project lead engineer”) by knowledge workers with specific domain expertise, and other more granular data elements (e.g., installation location) may have to be assigned by linking attributes from other authoritative data sources or systems of record.
Just preserving documents without having a systematic, dynamically updatable and holistic view created by assimilating other interrelated data points will result in an incomplete picture of a project or process. Without a holistic way of assembling and viewing all the extracted document attributes of interest to the various stakeholders, the overarching information governance needs of the organization will never be met. There will be incomplete, ambiguous, erroneous and superfluous data points.
Limited Data Points Means Incomplete or Distorted Pictures. As the elephant parable illustrates, having only one or a few attributes available results in having an incomplete or distorted picture of what is being managed. The blind men’s picture is so distorted in fact, that when word of an elephant rumbling through cane fields destroying them in search of food reaches their ears, they have no adequate description for the sum of the parts, and thus no way of applying the individually assimilated knowledge in a holistic fashion. The more uniform, accurate and persistent the document attributes or facets that are available, the greater the ability of the organization to assimilate seemingly disparate information to form a more accurate picture of present and future state projects.
Duplicated data sets. Without a holistic enterprise content plan, each stakeholder starts keeping their own copies of documents so they can extract the attributes they are interested in. The result is multiple copies of the same documents, multiple expenditures to extract the same attributes, and inconsistencies in ways that the same data is extracted and stored.
The challenge described above is endemic. It exists across all types of businesses in every jurisdiction. Corporations of all sizes are dealing with big data symptoms and are stymied when comes to finding a cure that has not been available from prior technology.
Standing apart from the herd is Continuum Advisors. At Continuum, we believe in using powerful emerging technology to help our clients address their most daunting data management challenges. To that end, we have incorporated BeyondRecognition (“BR”) in our services matrix for IG, legal, information security, RIM and a host initiatives that required powerful, scalable data analytics.
BR is a radically new, data-driven information governance technology that meets the IG needs of multiple stakeholders in any enterprise, public or private. Continuum has implemented BR technology at multiple Fortune 500 clients with great success.
The highly experienced CA team chose to align with BR as it is the only technology in the world that automatically classifies electronic files or scanned paper documents based on their visual characteristics – and without having to waste time writing rules to identify each type of document or designating exemplars for each document type. This is tremendously important because accurate, consistent classification is the bedrock upon which all IG initiatives are built. BR solves this long-standing, previously intractable problem.
Subject matter experts can quickly determine how to classify all the documents in a document cluster by examining one or two documents per cluster. They can also associate a document type name with the cluster based on their organization’s document classification tree, and assign retention periods based on the classification.
Our subject matter experts in energy, financial services, and pharmaceuticals work with corporate knowledge workers to extract multiple attributes from each document classification by “painting,” i.e., clicking and dragging boxes, on an image of a document from each cluster. BR then automatically extracts the specified attributes and associates each attribute with the attribute or field names assigned by the subject matter experts. The extracted data can then be loaded into the appropriate content management system.
The various attributes enable the BR-processed documents to be associated with management control systems, e.g., pipeline planning and maintenance, or capital asset acquisition, or ESH inspections. The various attributes serve to provide multiple views into the document collection.
The extracted attribute values can be normalized prior to loading into the target system or the extracted values can be used to update and validate existing field authority lists.