Common Exceptions and Special Cases, Processing Fundamentals Series Part 3

3 / 5

A multi-part series on the fundamentals eDiscovery practitioners need to know about the processing of electronically-stored information

In “Why Understanding Processing is Important,” we discussed understanding processing in the context of lawyers’ duty of technology competence.  In “Key Activities and Common Tools,” we discussed the core processing activities and some of the tools used to complete them.  In this Part, we review common processing exceptions and some special cases.

Although a great many of the files encountered during ESI processing are common types that can be handled in a standardized, automated way, not all are.  Almost every processing effort encounters at least a few exceptions during processing that cannot be handled without some manual intervention (if they can be handled at all).  Additionally, certain source types are special cases that routinely require custom work to process.  The handling of these exceptions and special cases can affect both project costs and the completeness of your data set.

Common Processing Exceptions

During the processing of a data set in one of the processing tools we discussed in the last Part, the tool will often encounter some files that it cannot automatically process.  Most commonly, these exceptions occur because the tool has encountered either corrupt data, a password protected file, or an unknown file type.  In any of these cases, the exception is logged and reported for later review by the processor and, when needed, by the case team.

Corrupt Data

Corrupt data is data that is physically incomplete or unreadable in some way.  This can be the result of a faulty sector in the storage medium, the result of a copying error at some point in the chain of custody, or the result of the original source itself having been corrupted.  In some cases, content can be recovered from corrupted file data, but in many cases, retrieval of a non-corrupt version from the source – if available – is required.

Password Protected Files

Many file types – from PST files to PDF files to Microsoft Office documents – can be protected with passwords by their authors or owners.  Files that are protected by passwords cannot be automatically processed until they are unlocked so that their content can be extracted. In some cases such passwords can be cracked or bypassed through the use of specialized tools, but in most cases getting the password from the author or owner is the fastest, cheapest solution.

Unknown File Types

Although most processing tools can identify and handle hundreds of common file types, there are thousands of file types in existence – including countless proprietary file types generated by custom corporate tools and systems.  When a processing tool encounters a file of either a type it doesn’t recognize, or for which it cannot identify a type at all, it logs an exception for later review by the processor.  If the type is unknown, manual expert review may be able to identify it, if the type is known but uncommon or proprietary, custom work may be required to process it – if it can be processed at all.

Source Type Special Cases

In addition to the unpredictable occurrence of processing exceptions, there are a range of source types that will almost always require some custom work and some additional decisions to handle.  Examples include:

  • Backup Tapes – working with materials from backup tapes can require restoration of large volumes of data to obtain the desired files for processing; with some tools, tapes can be indexed prior to restoration to allow for targeted restoration instead
  • Mobile Devices – data from mobile devices frequently contains databases and other aggregated types of data that requiring custom parsing into discrete records, for example: contacts databases, email databases, call logs, etc.
  • Social Media – data from social media websites and applications can, like mobile device data, include aggregated data types that must be parsed out into individual posts or messages, and linked content or specialized metadata fields may require custom work
  • Instant Messaging – these programs too, whether social messaging services or professional collaboration tools like Slack, often store communications in aggregated files that require custom parsing to separate into individual conversations or messages
  • Structured Data – structured data includes the large operational databases that underpin corporate systems like CMS, ERM, and more, and custom work is generally required to identify and capture the right portion of that data and then present it in a usable form

Upcoming in this Series

In the next Part, we will review de-NISTing, deduplication, and other objective culling that can take place during processing.

About the Author

Matthew Verga

Director of Education

Matthew Verga is an electronic discovery expert proficient at leveraging his legal experience as an attorney, his technical knowledge as a practitioner, and his skills as a communicator to make complex eDiscovery topics accessible to diverse audiences. A fourteen-year industry veteran, Matthew has worked across every phase of the EDRM and at every level from the project trenches to enterprise program design. He leverages this background to produce engaging educational content to empower practitioners at all levels with knowledge they can use to improve their projects, their careers, and their organizations.

Whether you prefer email, text or carrier pigeons, we’re always available.

Discovery starts with listening.

(877) 545-XACT / or / Email Us