A multi-part series on the fundamentals eDiscovery practitioners need to know about the processing of electronically-stored information
In “Why Understanding Processing is Important,” we discussed understanding processing in the context of lawyers’ duty of technology competence. In “Key Activities and Common Tools,” we discussed the core processing activities and some of the tools used to complete them. In this Part, we review common processing exceptions and some special cases.
Although a great many of the files encountered during ESI processing are common types that can be handled in a standardized, automated way, not all are. Almost every processing effort encounters at least a few exceptions during processing that cannot be handled without some manual intervention (if they can be handled at all). Additionally, certain source types are special cases that routinely require custom work to process. The handling of these exceptions and special cases can affect both project costs and the completeness of your data set.
During the processing of a data set in one of the processing tools we discussed in the last Part, the tool will often encounter some files that it cannot automatically process. Most commonly, these exceptions occur because the tool has encountered either corrupt data, a password protected file, or an unknown file type. In any of these cases, the exception is logged and reported for later review by the processor and, when needed, by the case team.
Corrupt data is data that is physically incomplete or unreadable in some way. This can be the result of a faulty sector in the storage medium, the result of a copying error at some point in the chain of custody, or the result of the original source itself having been corrupted. In some cases, content can be recovered from corrupted file data, but in many cases, retrieval of a non-corrupt version from the source – if available – is required.
Many file types – from PST files to PDF files to Microsoft Office documents – can be protected with passwords by their authors or owners. Files that are protected by passwords cannot be automatically processed until they are unlocked so that their content can be extracted. In some cases such passwords can be cracked or bypassed through the use of specialized tools, but in most cases getting the password from the author or owner is the fastest, cheapest solution.
Although most processing tools can identify and handle hundreds of common file types, there are thousands of file types in existence – including countless proprietary file types generated by custom corporate tools and systems. When a processing tool encounters a file of either a type it doesn’t recognize, or for which it cannot identify a type at all, it logs an exception for later review by the processor. If the type is unknown, manual expert review may be able to identify it, if the type is known but uncommon or proprietary, custom work may be required to process it – if it can be processed at all.
In addition to the unpredictable occurrence of processing exceptions, there are a range of source types that will almost always require some custom work and some additional decisions to handle. Examples include:
Upcoming in this Series
In the next Part, we will review de-NISTing, deduplication, and other objective culling that can take place during processing.