Judge Schiendlin issued an opinion Friday in a FOIA case that was critical of the practice of having government workers collect documents from their files, without strict oversight from counsel. The opinion will likely be cited in discovery disputes because, as Judge Scheindlin notes, “much of the logic behind the increasingly well-developed caselaw on e-discovery searches is instructive in the FOIA search context” (and thus, presumably, vice versa). The government’s brief had argued, “[i]t is . . . unclear why custodians could not be trusted to run effective searches of their own files, a skill that most office workers employ on a daily basis,” to which Judge Scheindlin responded:
There are two answers to defendants’ question. First, custodians cannot “be trusted to run effective searches,” without providing a detailed description of those searches, because FOIA places a burden on defendants to establish that they have conducted adequate searches; FOIA permits agencies to do so by submitting affidavits that “contain reasonable specificity of detail rather than merely conclusory statements.” Defendants’ counsel recognize that, for over twenty years, courts have required that these affidavits “set forth the search terms and the type of search performed.” But, somehow, DHS, ICE, and the FBI have not gotten the message. So it bears repetition: the government will not be able to establish the adequacy of its FOIA searches if it does not record and report the search terms that it used, how it combined them, and whether it searched the full text of documents. The second answer to defendants’ question has emerged from scholarship and caselaw only in recent years: most custodians cannot be “trusted” to run effective searches because designing legally sufficient electronic searches in the discovery or FOIA contexts is not part of their daily responsibilities. Searching for an answer on Google (or Westlaw or Lexis) is very different from searching for all responsive documents in the FOIA or e-discovery context. Simple keyword searching is often not enough: “Even in the simplest case requiring a search of on-line e-mail, there is no guarantee that using keywords will always prove sufficient.” There is increasingly strong evidence that “[k]eyword search[ing] is not nearly as effective at identifying relevant information as many lawyers would like to believe.” As Judge Andrew Peck — one of this Court’s experts in e-discovery — recently put it: “In too many cases, however, the way lawyers choose keywords is the equivalent of the child’s game of ‘Go Fish’ . . . keyword searches usually are not very effective.” There are emerging best practices for dealing with these shortcomings and they are explained in detail elsewhere. There is a “need for careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or ‘keywords’ to be used to produce emails or other electronically stored information.” And beyond the use of keyword search, parties can (and frequently should) rely on latent semantic indexing, statistical probability models, and machine learning tools to find responsive documents. Through iterative learning, these methods (known as “computer-assisted” or “predictive” coding) allow humans to teach computers what documents are and are not responsive to a particular FOIA or discovery request and they can significantly increase the effectiveness and efficiency of searches. In short, a review of the literature makes it abundantly clear that a court cannot simply trust the defendant agencies’ unsupported assertions that their lay custodians have designed and conducted a reasonable search.