The first step in email remediation, faceted deduplication, involves comparing entire objects, either message attachment groups or attachments, to see if they matched at the whole object level. This posting focuses on the second step, intradeduping embedded emails, which involves identifying where earlier emails are included within or embedded in later emails. This is an […]
There are profound differences in the capabilities of a glyph-based document processing engine compared to legacy optical character recognition (“OCR”) systems. From a process efficiency viewpoint, OCR treats each potential character as a fresh recognition task, meaning that even if precisely the same pattern of pixels had already been recognized, that same pattern will be put […]
Document-type taxonomy systems should be consistent in document classifications, be complete in accounting for both records and non-record documents, and remain timely as new document types are encountered and old document types evolve over time. This posting discusses how visual classification technology meets those three criteria. Consistency To be useful at all, taxonomy systems should […]
Today Amazon Web Services informed BeyondRecognition that AWS has taken down the Enron Email Data Set. This is the message we received: From: Amazon EC2 Abuse <firstname.lastname@example.org> Date: May 6, 2013, 4:36:57 PM CDT To: “email@example.com” <firstname.lastname@example.org> Subject: Your Amazon EC2 Abuse Report  Reply-To: Amazon EC2 Abuse <email@example.com> Hello John, We apologize for the delayed response to your report. […]
Why BR is better!
A recent AIIM survey report, “Information Governance, records, risks and retention in the litigation age” (link), highlights issues faced by organizations in trying to manage their documents: Custodian-based classification doesn’t work Disc storage is steadily growing with no end in sight Nobody ever seems to delete any electronic records Organizations want to unify their treatment […]