Google Maps teaches several lessons about information governance that are worth considering for any major IG initiative that will take in information from many sources, normalize it, keep it current, and present it in a useful fashion.
1. Start with a bold vision. Stephen Covey would call this begin with the end in mind. Without a vision it is difficult to select the proper tools, staffing and process. Without a bold vision there are seldom quantum leaps forward in performance or capabilities. Have a vision for all your documents, not just discrete silos.
2. Normalize content. Just as Google has to normalize content received from cities, counties, states, and the federal government, organizations have to be able to normalize their documents whether they’re in paper or held in file shares, Documentum, SharePoint, or FileNet. Without normalization true integration of content and the removal of duplicate content is impossible.
3. Be data-driven. Successful initiatives require and use hard data points. Measure things like throughput and consistency. Have 15 million documents you want migrated within 18 months and after 3 months only 20,000 have been processed? When you process the same document through different batches or by different reviewers, do you get the same result? Measuring the right data provides timely alerts to these sort of issues.
4. Eat the elephant one bite at a time. Google didn’t have camera cars to cover all the roads in one day, but it did have priorities and a schedule. Any successful IG initiative will use processes with reasonably predictable throughputs.
5. Have varying granularity for different purposes. Google Maps lets you zoom in to the street level or zoom out to provide an overview of a 1,500 mile road trip. When dealing with documents, the granularity should extend down to the glyph or graphical element level, and let you deal with successively larger objects, e.g., words, lines, paragraphs, pages, documents, similar documents, document types, and business owners. Depending on what you’re trying to do, each of those levels will be useful.
6. Plan for constantly morphing content. You’re never done – information governance is a process not an event. New types of documents are constantly being invented, and old ones morph over time, changing 15-20% per year. Your information governance initiative needs to anticipate how the new documents and morphing documents will be identified.
7. Embrace scale. Don’t let the scalability of your present tool set define the scope of your vision. If you’re already doing a good job with the present tools but major problems remain or major opportunities are being missed, find solutions that scale to handle your content.
8. Provide multiple use scenarios. Google Maps delivers its content in a variety of formats, ranging from high-level trip overviews to turn-by-turn instructions to street-level camera views. A well thought out IG initiative provides multiple “views” of documents – sometimes it will be the native file to enable further editing, sometimes it will be locked PDFs for scanned copies of contracts, sometimes, data can be extracted from documents to facilitate reporting or performing calculations based on the data points.
9. Provide findability. Google not only provides users with maps, it helps them get to the right map by tagging map locations with terms that help identify those sections of the maps. Digitizing content without findability basically amounts to hiding documents in plain sight – you know where they’re at in a general sense, but you can’t get to the ones you want.
Visual classification is a scalable approach to document classification, attribute extraction, and enhanced document find functionality. It normalizes paper and electronic content to provide for unified processes and retention polices across content type and silo. Its self-forming visual grouping of documents provides a data-driven way to identify existing document types and provides alerts when new document types start being used or when old ones morph significantly.
Visual classification is text-independent and does not need or use text for its classification of documents.