Some Final Steps and Key Takeaways, Processing Fundamentals Series Part 5

5 / 5

A multi-part series on the fundamentals eDiscovery practitioners need to know about the processing of electronically-stored information

In “Why Understanding Processing is Important,” we discussed understanding processing in the context of lawyers’ duty of technology competence.  In “Key Activities and Common Tools,” we discussed the core processing activities and some of the tools used to complete them.  In “Common Exceptions and Special Cases,” we discussed scenarios requiring extra work and decisions during core processing activities.  In “Objective Culling Options,” we discussed de-NISTing, deduplication, and content filtering.  In this final Part, we review final steps and key takeaways.

In addition to the core activities of expansion, extraction, normalization, indexing, and objective culling that we have already discussed, there can be a variety of additional steps required during processing to prepare the materials for subsequent early case assessment, review, and production activities.

Potential Additional Steps

Depending on the platform in which the material will be used and the ways that it will be used, additional steps may be required to finish preparing it for those activities.  For example, we noted in the last Part that it is not uncommon to create and populate a custom master date field that integrates values from different date/time fields associated with different file types.  It is also common to create other custom metadata fields, such as a field that extracts the domain names associated with email addresses, or a field that documents collection source details such as custodian or directory.  The specific fields to be created will depend on the material with which you will be working and what you hope to accomplish with it during ECA and review.

In addition to custom metadata fields, final preparation activities may also include the preemptive generation of TIFF images of the documents (i.e., PDF-style page images), if there is a desire to review documents in that form (or a need to have them ready for rapid production turnaround later).  And, if the subsequent activities are taking place in a different software platform than the processing (which is often the case), some form of load file will also need to be prepared.

Load files are, essentially, enormous tracking spreadsheets that can contain every document, its extracted metadata (and any custom fields), its extracted text content, links to associated native files, links to standalone text files, links to associated TIFF images, and other details.  They serve as Rosetta Stones for the ECA and/or review software to understand how all the thousands upon thousands of discrete files and pieces of information you’re loading into it for a given project fit together in a usable way.


Regardless of the specific steps taken in a given processing project, all processing efforts generally end with some form of quality control validation process prior to the hand-off to ECA and review activities.  As we’ve described above, the end product of a processing effort is a complex assemblage of elements that may include hundreds of thousands of native files, image files, text files, load files, and a variety of customizations.  Given that enormous volume, diversity, and complexity, a wide range of simple technical issues are possible, including file naming errors, load file field errors, file linking errors, imaging errors, and more.

To identify such issues prior to loading for subsequent activities, processors typically employ some combination of targeted quality control checks for specific issues, random sampling checks to spot any other issues, and software validation tools to backstop the human checks.  Once any issues have been identified and remediated, materials can be handed off for ECA and review to begin.

Key Takeaways

  1. Having at least a basic understanding of processing activities is essential to fulfilling a lawyers’ duty of technology competence for eDiscovery, because processing decisions can and do have substantive effects on downstream discovery activities.
  1. The primary processing activities are: expansion, which opens container files and handles embedded objects; extraction, which captures text and metadata from all those files; normalization, which standardizes the format and appearance of that extracted content; and indexing, which creates the inverted and semantic indices that enable searching.
  1. It is common to encounter materials during processing that require custom work or that cannot be processed at all because they are password protected, corrupted, or of an unknown type; moreover, some source types will almost always require custom work, like mobile devices, instant message/chat platforms, or social media.
  1. During processing, objective culling is typically performed to remove system files (de-NISTing and file-type filtering) and to remove duplicates (deduplication); additionally, you usually have the option to perform content filtering based on date range (very common, especially if negotiated) and keyword (less common, due to process limitations).
  1. Additional steps will need to be taken to finish preparing your processed materials for ECA and review, potentially including custom field creation, TIFF image creation, or load file creation – depending on your goals and tools, and always including some form of quality control validation process to check for common technical errors.

For Assistance or More Information

Xact Data Discovery (XDD) is a leading international provider of eDiscovery, data management and managed review services for law firms and corporations.  XDD helps clients optimize their eDiscovery matters by orchestrating precision communication between people, processes, technology and data.  XDD services include forensicseDiscovery processingRelativity hosting and managed review.

XDD offers exceptional customer service with a commitment to responsive, transparent and timely communication to ensure clients remain informed throughout the entire discovery life cycle.  At XDD, communication is everything – because you need to know.  Engage with XDD, we’re ready to listen.

About the Author

Matthew Verga

Director of Education

Matthew Verga is an electronic discovery expert proficient at leveraging his legal experience as an attorney, his technical knowledge as a practitioner, and his skills as a communicator to make complex eDiscovery topics accessible to diverse audiences. A fourteen-year industry veteran, Matthew has worked across every phase of the EDRM and at every level from the project trenches to enterprise program design. He leverages this background to produce engaging educational content to empower practitioners at all levels with knowledge they can use to improve their projects, their careers, and their organizations.

Whether you prefer email, text or carrier pigeons, we’re always available.

Discovery starts with listening.

(877) 545-XACT / or / Email Us