Seven years after it first rose to prominence in eDiscovery, technology-assisted review remains an important, and at times controversial, tool in the eDiscovery practitioner’s toolkit
In “Still Crazy after All These Years,” we discussed the slow but steady growth in the importance of TAR. In this Part, we review the first case to address TAR, da Silva Moore.
The year after technology-assisted review first rose to prominence in eDiscovery, Monique da Silva Moore, et al., v. Publicis Groupe SA & MSL Group (S.D.N.Y. Feb. 24, 2012) became the first case in which the use of TAR was judicially approved. In da Dilva Moore, the Plaintiffs were pursuing a class action over alleged gender discrimination by the Defendants. During the course of discovery, the parties reached an impasse over the appropriate methodology by which to evaluate the approximately 3,000,000 emails the Defendants had gathered for the matter.
Given the volume of material to be reviewed, the Defendants proposed using a predictive coding solution. The methodology they proposed involved a seed set development process, an iterative training process, and voluntary disclosures to the Plaintiffs:
The seed set would be developed through a combination of methods intended to cover all bases: review of a statistically significant random sample, additional judgmental sampling, and review of the top 50 results from a variety of searches (including those suggested by the plaintiff).
After the seed set was used to initially begin the predictive coding process, seven rounds of iterative training would be conducted to refine the predictive coding results. Each training round would involve the review of 500 results, and the final round would be followed by another review of a statistically significant random sample to see how much, if any, relevant material had been missed by the process.
The Defendants proposed, at the end of the successful predictive coding process to then review and produce the top 40,000 results identified through that process, which they argued would cost a proportionally correct amount for the matter. Additionally, for the sake of process transparency, the Defendants would turn over all non-privileged documents reviewed to create the seed set (both relevant and not), all non-privileged documents reviewed during iterative training (both relevant and not), and all non-privileged documents reviewed during the final random sample to check for missed material (both relevant and not).
The Plaintiffs’ objections were not to the use of predictive coding itself but to several specifics of the Defendants’ proposed workflow. They objected most strenuously to the Defendants’ plan to review and produce only the top 40,000 results of the predictive coding process and to the lack of fixed standards against which to measure the quality and reliability of the process and its results (e.g., no bright line for maximum acceptable amount of missed material).
In his decision approving the use of predictive coding, Magistrate Judge Peck tackles the predictive coding topic head on, providing explanation of its operation, performance comparisons to more traditional methods, and numerous citations to relevant studies and articles. He explicitly makes the case for predictive coding as a desirable, efficient solution for large-scale review challenges. He bases his ultimate approval of its use in this case on five factors:
. . . (1) the parties’ agreement, (2) the vast amount of ESI to be reviewed (over three million documents), (3) the superiority of computer-assisted review to the available alternatives (i.e., linear manual review or keyword searches), (4) the need for cost effectiveness and proportionality under Rule 26(b)(2)(C), and (5) the transparent process proposed by [the Defendants].
Regarding the specifics of their proposal, however, he did make some adjustments. First, he rejected their plan to cut off review and production after the top 40,000 results, calling it a “pig in a poke.” Second, he warned the Defendants that they could not decide in advance that seven rounds of training would be sufficient. When to cut off review and when to cut off training were both questions of proportionality, which would have to be made as fact-based determinations using real information about costs and results after work has begun.
Magistrate Judge Peck also addressed the correct timing of proportionality determinations in responding to the Plaintiffs’ objections to his decision. Among the Plaintiffs’ objections was that the Magistrate Judge was “simply kicking the can down the road” by allowing the Defendants’ to proceed without predetermined standards against which to measure quality and reliability. He responds by explaining why “down the road” is, in fact, the right place to address such objections and make such determinations:
In order to determine proportionality, it is necessary to have more information than the parties (or the Court) now has, including how many relevant documents will be produced and at what cost . . . . In the final sample of documents deemed irrelevant, are any relevant documents found that are “hot,” “smoking gun” documents (i.e., highly relevant)? Or are the only relevant documents more of the same thing? One hot document may require the software to be re-trained (or some other search method employed), while several documents that really do not add anything to the case might not matter. These types of questions are better decided “down the road,” when real information is available to the parties and the Court.
Upcoming in this Series
In the next Part of this series, we will review the Kleen Products case, in which one party attempted to compel the other to utilize a technology-assisted review solution.