As the year winds down I’ve taken a look back over the 40+ posts I’ve done so far this year to see which had the highest readership. Apparently quite a few people thought the top-viewed posts were useful so I thought I’d provide a summary of them for anyone who might have missed them earlier:
Converting Unstructured Content to Structured Content (April, 2016), 1,500+ views. How to map structured database fields to definable attributes in specific document types. Useful for populating or for auditing process-control systems supported with unstructured content/documents.
Why Embedding Referential Metadata in PDFs is a Good Idea (Feb., 2016), 1,000+ views. Why including referential metadata can be helpful, e.g., including original file path so the path would be searchable. Also useful to include inferential metadata, e.g., tagged document attributes.
REALLY Compressing PDFs (Jan., 2016) 1,000+views. How use of multiple compression algorithms on same PDF can save considerable storage and transmission resources. Multiple algorithms supported in PDF standard but not often used.
Limitations of Using OCR for File Classification (July, 2016). Discusses drawbacks of OCRed text including font size, single dimensional, case sensitivity, non-textual glyphs, language restrictions, non-symmetrical DPI for faxes, and logical document boundary issues.
Bottom-Up ECM Classification (August, 2016). How self-forming document clusters become the basis for data-driven ECM classifications. With bottom-up classification, content managers can overcome the gaps in their awareness of all the document types contained in large collections.
I’ve covered these and many related topics in my book, Guide to Managing Unstructured Content, which is available for free download at http://beyondrecognition.net/download-john-martins-guide-to-managing-unstructured-content/
I want to wish all my blog readers happy holidays and a prosperous New Year!