The key to many e-discovery document review challenges is to think of PDFs as content containers with almost limitless options on how they are filled, arranged, and navigated. PDFs are already used to store and distribute documents that can be numbered,labelled, and viewed without having the software used to create the content in the first place. Here are some other powerful functions that can also be performed with PDFs:
Native or PDF? Yes! (Embedding Native Files)
An ongoing debate in e-discovery is whether files should be produced in native or in a standardized TIF/PDF format. With PDF it isn’t just a binary choice, the native files can be embedded in PDFs so that there is both a visible PDF version with page numbers and labels plus the native format. The native format can be very useful in, e.g., examining formulas or hidden columns in Excel, or reviewing revisions or comments in Word.
Here’s an example of how an embedded file could be represented in the PDF. Clicking on the “pin” icon would cause a list of embedded files to be displayed and the user could right-click and open any of the files.
Self-Authenticating Embedded Files
Another embedding option is to use self-authenticating embedded file names: Calculate the SHA hash values for files to be embedded and use the hash values as the file names when they are embedded. If someone wants to confirm that the native files haven’t been altered they can calculate the SHA values for the files and compare them to the named of the filed.
Message Attachment Groups as Single PDFs
Emails and their attachments are often processed as individual files. When attorneys review emails they may have to take several steps to examine attachments. Processing the message attachment group (“MAG”) as a single PDF permits the attorneys to review all the relevant files at one time with a minimum of wasted time.
Creating MAG PFFs can be combined with embedding native electronic files to make review even faster and easier.
Navigation – Bookmarking Multiple-Document PDFs.
Bookmarks can be an invaluable aid in navigating large files. However, bookmarks are easily overlooked in systems built on a paradigm of replicating paper review systems. However, for attorneys who review electronic files, bookmarks can be enormously useful. When constructing MAG PDFs, bookmarks can be used to help navigate to specific attachments.
Bookmarks are also useful to navigate large individual files like those created by scanning operations where an entire box of documents can be represented in a single PDF files. Identifying document boundaries and classifying documents and using the classifications for bookmarks can greatly improve the usability of such files.
Be sure to set the PDF option to automatically display bookmarks if you use bookmarks for the convenience of the attorneys as some of them may not think to open the bookmarks.
Transportable Classification and Fielded Metadata (Embedding Fielded or Tagged Data)
One of the ongoing challenges in e-discovery and content management is being able to move individual file or document objects while retaining relevant classifications or tags that may have become associated with them. Without the accompanying index or indices, the files may be essentially invisible in their new location. One answer is to embed the tags or classifications within the moved PDFs in a format that can be indexed by a variety of search or content management systems, even though they are not visible within normal document views. For example, the embedded content could include privileged status, document type, date, original file name, location of duplicates, etc.
For more information on using PDFs, see my earlier postings on:
- Really Compressing PDFs: http://beyondrecognition.net/really-compressing-pdfs/
- Why Embedding Referential Metadata in PDFs is a Good Idea: http://beyondrecognition.net/why-embedding-referential-metadata-in-pdfs-is-a-good-idea/