Computer scientist to advise Memphis startup as it launches innovative document analysis technologies
Germantown, TN (July 31, 21012). John Martin, founder and CEO of BeyondRecognition, LLC, announced today that Stephen V. Rice, Ph.D., has agreed to join BeyondRecognition’s Advisory Board and to consult with BeyondRecognition. Martin indicated that, “One of the major functionalities provided by BeyondRecognition is our ability to extract textual and other non-textual glyphs from document images. Dr. Rice’s scientific background uniquely qualifies him to help us to create objective measures of the accuracy of that conversion process at the character, word, and significant-word level.”
For five years, Dr. Rice conducted the first large-scale independent evaluations of commercial optical character recognition (OCR) systems while at the Information Science Research Institute of the University of Nevada, Las Vegas (UNLV). As part of that work he developed sequence comparison algorithms to measure OCR accuracy. He is author of the classic book, “Optical Character Recognition: An Illustrated Guide to the Frontier,” which is essential reading for anyone involved in developing OCR or CAPTCHA systems.
Rice noted that, “BeyondRecognition has taken a fresh approach to the challenge of extracting text from document images. They have developed several innovative technologies for document conversion and retrieval. I look forward to assisting them as they continue to break new ground in this area.”
Martin continued, “For the past year we have been building out our code base and our infrastructure and will be looking to Dr. Rice to help us develop statistically-sound performance measures to evaluate our performance. For example, next month we plan to benchmark our system on the approximately 30 million page images of tobacco litigation documents obtained from the Text Retrieval Conference (TREC) sponsored by the National Institute of Standards and Technology (NIST). Our goal is to perform the text conversion on the 30 million pages, globally edit the text, index it and be able to perform sub-second retrieval on any page in the collection within 72 hours, start to finish. We want valid, reliable metrics to use to report the results.”
As to the significance of the document collection, Martin observed, “The TREC Tobacco documents have been used by the TREC Legal Track to gauge the efficacy of various text retrieval systems and methodologies, despite known issues of using inaccurate OCR. The studies compared the results from queries or processing to the documents identified by manual reviews, and the results have been used to argue in favor of ‘predictive coding’ or ‘technology-assisted review.’ We believe that the new, more accurate text output by BeyondRecognition will let researchers show how the early studies may have understated the effectiveness of technology-assisted review.”
About Dr. Rice
Dr. Stephen V. Rice consults as a computer scientist and software engineer with expertise in algorithms, computer audio, computer simulation, database systems, pattern recognition, programming languages, and related areas. He was a computer science professor at the University of Mississippi, was chief software engineer at the UNLV Information Science Research Institute, and is Founder and CTO of Comparisonics Corporation. He served on the board of advisors of the Federal Intelligent Document Understanding Laboratory of the U.S. Central Intelligence Agency.
For more about Dr. Rice, see http://www.stephenvrice.com.
BeyondRecognition has developed unique character, word, and document attribute recognition and extraction capabilities for analyzing image-based documents. Its glyph clustering and cataloging approach enables rapid, globally-editable text recognition with accuracy rates far beyond traditional OCR. BeyondRecognition also clusters documents based on visual similarity and permits location-based, cluster-specific data element extraction for coding or abstracting data elements from the documents. Clustering by document type permits prioritized data element extraction using the powerful graphical user interface to highlight zones, and to write and instantly test and verify extraction rules.
Although nominally a “startup,” the principal technologists at BeyondRecognition have been working in the fields of document conversion, electronic evidence forensics and processing for decades. CEO John Martin was previously a founder of Cricket Technologies and RedFile LLC.
# # #