The mortgage industry has a significant information management challenge determining if all required documents are present in loan files with accurate or at least consistent information. Organizations buying, selling, or auditing mortgages need to meet that challenge before they can determine the value or risks associated with the loans.
This page details how visual classification can be used to meet the loan document classification challenge. This technology is provided by BeyondRecognition (“BR”), a privately held company that provides advanced document classification technology. Its visual classification technology groups or clusters visually-similar documents and enables clients to classify those documents using a three-level document classification tree.
With visual classification financial institutions and regulators can be confident that they know what documents are in a loan file, if the file is complete, if there are multiple occurrences of some document types in a file, and what key data elements have been entered on the documents.
Loan File Analysis
Individual loan files contain a number of required documents, including:
- Appraisal Reports
- Closing Documents
- Credit search authorizations
- First Payment Letter
- Flood Certificates
- Good Faith Estimates
- Inspection reports
- Lender Fees
- Loan applications
- Mortgage Insurance
- Mortgage Notes
- Offers to Purchase
- Title searches
- Truth-in-Lending Disclosures
- VA Loan Analysis
- Verification of Deposit
- Verification of Employment
The key to visual classification is that within document types, the documents look alike. For example, you can view the set of unorganized documents in the first graphic and pick out certain document types even if you can’t read the individual words on the documents.
Workflow. This is a high-level workflow for loan file analysis from the client’s perspective. Each phase is discussed below.
Document Classification. Once the loan files have been ingested, BR automatically groups or clusters the documents based on visual similarity and client’s subject matter experts determine what document-type label should be applied to each cluster. This document-type label is then assigned to all documents in the cluster or that are later added to the cluster.
After the initial week or two of reviewing clusters to assign document types, most documents will fall into previously-designated clusters. Note that there will generally be several clusters assigned the same document type because of differences in their appearance, e.g., there may be several clusters that are all called “lien release.”
This process is extremely accurate and consistent and BR offers a guaranteed accuracy level above 99% accurate. This consistent document classification enables the use of automated approaches to confirming that, as a threshold manner, all required documentation is in place.
Document Unitization or Boundaries. One of the recurring problems of classifying documents in loan files is that there are typically significant issues with document boundaries or unitization. This is caused by the way the documents are initially faxed, scanned, or assembled into multi-document PDFs of TIFs. Most content management systems are configured so that there can be just one document type, author, and document date per “document.” When there are multiple documents per file the second and subsequent documents are essentially invisible or hidden.
BR technology learns what the beginning pages of documents look like and can be used to identify proper document boundaries within multi-document files as an optional part of the classification process. This makes all downstream analysis and processing far more accurate and reliable. When tested, BR document unitization has been found to be more consistent and reliable than manual document unitization.
Zonal Attribute Extraction. Each document type in a loan file has certain types of information or attributes that are expected to be there, e.g., full name and social security number on the loan application. BR provides a graphical user interface where subject matter experts can specify where on each document type each data element or attribute is expected to occur. This is done only once for each cluster and because clusters can have tens of thousands or hundreds of thousands of documents per cluster, the modest time spent doing this yields enormous efficiencies.
Quality Control. BR uses sophisticated algorithms and authority lists to determine if it is successful in programmatically extracting the expected attributes from each document. Where no data elements or low confidence data elements were extracted, BR can send only the zone of the document needing manual intervention to outsourced data entry people. In doing this the zones are disaggregated or disassociated, meaning that the people performing the data entry are not able to associate multiple data elements from the same document, e.g., they don’t know which person’s name is associated with which social security number.
Load File. The final output from the classification and attribution process is a load file that contains the assigned document type and the document attributes for each document. This can be in the form of CSV files or XML files. The output can also include content-enabled PDF images with both text and image layers. The client has full control over the type and format of the output.
Compliance. Loan files can be ingested and analyzed to determine if the loans have been properly documented per the regulations in force at the time. Subject matter experts can review single instances of individual document clusters to determine what factors need to be verified when assessing compliance. This changes compliance from a document-by-document ad hoc process to a repeatable business process model.
Auditing Loan Management Systems and Databases. Loan portfolios are typically managed in some form of content management system that has various underlying tables and fields to track attributes about each loan. Each of those data elements should be documented by an entry on a specific document type within the loan file. The BR classification and data extraction process can be used to match management system entries against corresponding date elements pulled from the underlying documents.
Mergers and Acquisitions. When existing companies are acquired and their documentation needs to be integrated into the acquiring company’s loan management system, BR’s classification and attribute extraction process can be used to quickly normalize how loans are represented in control systems and to accurately ingest the new loan assets.
Loan Approval Processing. In addition to evaluating existing loan files for various reasons, visual classification and attribute extraction technology can also be used to speed ongoing loan applications, enabling better decision-making with fewer resources.