A multi-part series on the fundamentals eDiscovery practitioners need to know about effective early case assessment in the context of electronic discovery
In “Clearing the Fog of War,” we reviewed the uncertainty inherent in new matters and the three overlapping goals of ECA. In “Sampling a Well-Stocked Toolkit,” we began our survey of available tools and techniques with an overview and with a discussion of sampling. In this Part, we continue our survey with a discussion of searching and filtering options.
As we discussed in the last Part, almost any modern document review platform provides case teams with a powerful set of tools for investigating their collected ESI in pursuit of the three goals, like a series of overlapping lenses you can use to bring your quarry into sharp focus. After sampling, the next major type of tools and techniques available is search and filtering, including keyword and phrase searching, Boolean searching, fuzzy searching, conceptual searching, and more.
Searching, both on the internet and among our own emails, messages, and files, has become an inescapable part of everyday life. Almost all of this searching, like the searching you do in eDiscovery, is powered by some form of indexing. In the eDiscovery context, indexing is typically performed during the processing phase of the project.
Indexing is the process of creating the enormous tables of information that are used to power search features. Most common are “inverted indices,” which essentially make it possible to look up documents by the words within them. Inverted indices are like more elaborate versions of the indices you find in the backs of books. Decisions during processing about how indices should be generated and what common words (e.g., articles, prepositions) they should skip affect the completeness of search results you get during ECA. Searches can only find what indices show.
More sophisticated semantic indices are created to power features like concept searching, concept clustering, and technology-assisted review, which we will discuss further in our section on analytic tools and techniques.
The types of indices that are prepared and the specific features your software offers for working with them will dictate what types of searching are available to you.
Just as it says on the tin, keyword and phrase searching lets you search for a key word, for a phrase, or often, for lists of both at once. Just as with the basic internet searching we all use, if one of the desired keywords or phrases is present, the document will be returned. One key area of variation from tool to tool is whether wildcard characters can be used to find variations on words and, if so, how they can be used.
Boolean search is the next step up in sophistication from basic keyword searching. It allows the use of “operators” such as “and,” “or,” and “not.” These operators allow for the searcher to define specific relationships between key words and phrases to achieve higher quality results (i.e., improved recall and precision). Other operators may be available, including proximity operators (i.e., to find a particular word appearing within a certain number of words of another particular word).
The range of specific operators available varies with the tools being used, as can their precise operation. Thus, it is important to understand the tools you are actually using to be sure you are searching the way you intend.
Fuzzy searching (also sometimes referred to as approximate string matching or stemming) is another extension of basic keyword searching that may be available to you. Fuzzy searching allows a search to return variations on a word rather than just the precise word you searched (e.g., finding both invite and invitation). How much variation is allowed is typically an adjustable setting.
As noted above, conceptual searching is powered by different types of indices than traditional searching. Conceptual searching uses these indices to try to return results based on related ideas and topics rather than based on whether the same words and phrases are used.
In addition to these core search functions, most review tools also offer a range of reporting and administration tools (e.g., saved searches, search history, etc.) – as well as sampling tools – to assist you in brainstorming, testing, and iteratively improving searches to meet your information needs.
In addition to your searching options, most platforms also offer you a range of options for sorting and filtering by specific properties of documents to help you surface what matters and prioritize what matters most. Most often this is based on a combination of metadata values extracted from the documents, such as file type and date, and custom-created metadata values, such as domain name or custodian. Often, these types of sorting and filtering capabilities are now tied to visualization tools that let you see the distribution of materials (and any gaps in it) at a glance and that allow you to adjust a range of value limits to see how they narrow or expand your results.
Just as we noted that sampling is excellent for finding unknown unknowns (the materials for which you don’t know you need to be looking), so searching is excellent for finding your known knowns and unknowns (the materials for which you do know you need to be looking). Targeted searching of key words, names, and phrases is one of the fastest ways to find relevant materials and hot documents. It is where practitioners focused on the traditional ECA goal typically start.
Filtering and visualization tools, on the other hand, are excellent at EDA – helping you assess the completeness of your collection, the most important dates and sources, the connections between custodians, and more. Leveraging these tools can quickly identify gaps that need to be filled through further collection and rich veins of materials that should be prioritized for further assessment. They can also assist in traditional ECA by illustrating the flow of communication between and with the key players.
Both searching and filtering are important to the goal of Downstream Prep. When looking ahead to review and production, your goal is to eliminate as much of the chaff as possible without losing an unreasonable amount of the wheat. The more you can refine your search or searches, the more searches you can negotiate to apply, and the more filters and exclusions you can apply, the more of that chaff you can eliminate, thereby reducing the cost and duration of all downstream activities. Even prioritization and organization without volume reduction will yield savings and quality benefits in downstream activities.
Upcoming in this Series
In the next Part, we will continue our review of available tools and techniques with a discussion of threading and duplicates.