A multi-part series discussing the realities of eDiscovery in the context of investigations
“It’s like finding a needle in a stack of needles.”
– Robert Rodat, Saving Private Ryan
In the first Part of this series, we reviewed the categories of investigations in which companies are frequently involved and their general eDiscovery ramifications. In the second Part, we discussed the need for speed and secrecy in the conduct of an investigation and ways to achieve them. In this Part, we dive deeper into the need for nuanced analysis and review.
Analysis and review is the process of figuring out what happened by investigating your collected evidence, and that process is made more challenging when relevant individuals have actively tried to conceal what’s happened – or at least tried to be subtle about it while it was happening. As we discussed in the first Part, investigations often concern misconduct of one kind or another, and most individuals instinctively try to render their own misconduct non-obvious.
It is not uncommon for individuals to communicate using euphemisms or coded language or to communicate using alternative channels. In some cases, individuals will attempt to obfuscate through the use of misleading file names or changes to file extensions. And, as we discussed in the second Part, some individuals will also attempt to destroy evidence if given the opportunity.
To overcome these challenges, the analysis and review process must be undertaken with these realities in mind, and it must be carried out in a way that will help you find these well-hidden needles.
Thankfully, there are a variety of tools and techniques you can employ to help you find hidden things and other unknown unknowns:
One of the most useful tools for effective early case assessment, analysis, and review planning is random sampling. Random sampling is not susceptible to our biases and blind spots the way keyword searches are. It provides a cross-section of everything you have, including the unknown unknowns. It provides examples of the different kinds of language used by relevant individuals in different contexts, which in turn helps you identify language that stands out as atypical. If executed formally, it can even provide you with reliable estimates of how prevalent different types of material are in your overall collection.
Many modern discovery tools make possible some form of frequency analysis. Rather than just testing various search strings and filters, frequency analysis tools show you all of the values of a particular type present in a given set of materials and tells you how frequently they occur. For example, you might be able to review a list of email correspondents for a particular custodian to learn who they correspond with most often and to look for any names that shouldn’t be there. Or, a list of all frequently-occurring short phrases (e.g., 2-4 words) could be generated and reviewed to look for new potential search phrases and potentially-relevant euphemisms or coded language. As with random sampling, such analysis can reveal things for which you would not otherwise have known to look.
Conceptual analytic tools include concept searching, concept clustering, and find-more-like-this features. Conceptual tools look at patterns of language rather than the specific language itself to facilitate searching and sorting that is more tied to contextual meaning than to precise phrasing. As with random sampling and frequency analysis, these tools are useful for finding things when you don’t know exactly what you’re seeking. Conceptual searching will return related results even if the exact words don’t match; conceptual clustering can reveal topics you didn’t know were there; and, find-more-like-this features can extrapolate from one relevant document – or even from a synthetic example of your creation – to find similar documents in your collection.
When investigating individuals’ conduct, email threading features are very helpful for understanding the sequence of relevant communications and the individuals involved. Many discovery tools now offer visualization features built off of a threading analysis to let you map the flow of communication. It is also often critical in investigations to be able to aggregate material from multiple communication sources in a chronological way. It is not uncommon for a conversation to begin in email, continue in an instant messaging tool, and conclude in SMS text messages. Only with all of the pieces in one place can the full sequence of events be reviewed.
As we noted in the previous part, The Need for Speed & Secrecy, eDiscovery Investigations Series, technology-assisted review can be used freely during internal investigations (and in many agency-initiated investigations). In addition to providing the speed advantage we discussed, technology-assisted review also extends the conceptual analytics advantage into your review process – taking your decisions and extrapolating from them based on semantic patterns rather than precise wording.
Finally, in the event that your analysis or review reveals gaps in the collection (or that you have other reasons to believe spoliation may have taken place), forensic analysts can undertake a variety of investigative steps to attempt to recover deleted materials or to determine user activities on a given device (e.g., to document data theft or destruction).
Upcoming in this Series
In the next Part of this series, we will discuss in more detail the need to be prepared for later litigation.