Update: Since our posting that Amazon Web Services had taken down the EDRM/FERC/Enron data set, EDRM and Nuix announced that Nuix had cleansed the Enron data set of more than 10,000 items containing private, health and financial information (http://www.edrm.net/archives/17490), and Index Engines announced that it had found “more dirt” in the EDRM/Nuix data set (http://www.poweroverinformation.com/index-engines-finds-more-dirt-on-nuixs-cleansed-enron-data-set/).
OK, everybody now acknowledges that the privacy genie is out of the bottle in terms of the EDRM/FERC/Enron email collection. What now?
Here are four specific suggestions to use these PII breaches as a learning experience and, co-incidentally, to help achieve data breach notification for the victims:
1. Crowd Source a Breach Notification List
EDRM volunteers come from a variety of organizations, many of which have considerable technical and human resources. They could divide up those documents from the EDRM Enron Email Data Set v2 found by Nuix or Index Engines to contain PII and create a data breach notification list of names, Social Security numbers, and last known contact information from those documents. The law schools and paralegal schools that teach e-discovery could also participate. The aggregated lists could be consolidated and then FERC and/or EDRM could provide notification. There would be costs, but certainly a sponsor could be found for that worthy purpose. From a learning perspective, working a few hours on generating a notification list should give budding law students or paralegals a real sense of how onerous data breach notification can be.
Perhaps volunteers could be solicited to staff a hotline to answer inquiries from data breach victims or to call victims whose only known contact information are phone numbers.
2. Have a TREC Legal PII Track
TREC could have a PII identification track where various vendors would document how well they can identify all documents containing PII from the EDRM Enron Email Data Set v2 and extract data needed for a breach notification list. The aggregated result set could be checked against the crowd sourced notification list described above.
3. Crowd Source Data Set Enhancements
EDRM could coordinate the creation of documents that present specific issues in e-discovery, e.g., scanned images that don’t OCR, or password-encrypted files, so that students and researchers could test their abilities on the data that has live PII removed.
4. Ethics of Protecting PII
Data breaches are more than just fodder for columns, articles, blog posts, list serve repartees, and bar exam questions. They involve real people whose lives and senses of security are directly and significantly impacted by the disclosure and dissemination of PII. All of us involved in e-discovery ought to be sensitive to that, especially those who are officers of the court. EDRM and bar associations ought to address protecting all PII, even that not belonging directly to a client.
Earlier posts on the EDRM/FERC/Enron data set
April 30, 2013: Lessons from the EDRM/FERC/Enron Data Privacy Breaches
May 6, 2013: Amazon Web Services Takes Down Enron Email Data Set
May 8, 2013: Background on EDRM/FERC/Enron PII Disclosures (FERC knew PII breaches were likely, 5th Circuit decision did not reach merits)
Photo Credit: Picture of blue Pinto taken from http://en.wikipedia.org/wiki/File:Bluepinto.jpg#file.