The Prosecution often produces voluminous discovery in multiple electronic formats. We try to make that discovery more manageable using multiple tools.

Text Conversion Tools

Files containing text should be in a searchable format. Files in Word, Excel, ASCII, and many other formats are inherently searchable, but other formats may require Optical Character Recognition (OCR) and conversion to PDF, Word, or other formats. We primarily use Omnipage by Nuance for such tasks.

Take for example the following image file.

 

The file is readable by the human eye, but because it is an image, it is not searchable by a computer. If the defense were looking for all mentions of the word “summer,” a computer search of this image would not find the word. After conversion to PDF with Optical Character Recognition (OCR), the file is searchable, but it is not without errors.  Because of the rotation in the original, no text is found in the PDF.  Even if rotated, many words are converted incorrectly, such as IDREAM instead of DREAM.  The OCR conversion programs tested had less than 20% accuracy rate.

Conversion to Word gives an even worse edition, even after rotation, with all words converted incorrectly. However, the Word version can be corrected manually by comparing the images and typing the correct words. This is often a labor-intensive project that should only be used for important documents in important cases. The result should be a fully searchable document. The Word documents will often look quite different from the original image.

Many programs can be used for OCR conversions, such as OmniPage, SimpleOCR, PaperPort, OCRFeeder, and Adobe Acrobat Pro.

Text Search Tools

Multiple desktop search tools are available, but each seems to give slightly different results, so we use several.

Our primary search tool is X1. X1 is very fast once indexes are created, but the index creation process can be painfully slow. For example, I attempted to index approximately 2 terabytes of discovery and the index took nearly 24 hours to complete, even though indexing was the primary process on the computer. However, once the index is created, X1 is blazing fast and has robust search features to find the desired discovery.

X1 allows Boolean queries, such as NEAR, AND, OR, and NOT, and it allows for wildcards and parentheses to control order of operations. Searches can also be limited to date or file location. This helps when searching for references to specific persons in discovery, such as search for “Fred NEAR Jones” which will find documents with the word Fred within ten words of the word Jones or “Jones -Melissa -Dave” which will find documents with the word Jones but not Melissa or Dave.

The latest version of Copernic Desktop Search has built-in OCR functions, so we may switch to it as our primary search tool. Both X1 and Copernic are powerful search tools, but they give slightly different results to queries, so for now we use both on important jobs, and we use Lookeen as well.

X1,Copernic Desktop Search,and Lookeen are all excellent paid options (we spend about $150 a year for the subscriptions and upgrades), but for someone wanting free options, try Voidtools Everything or Listary.

.