The Key to Curbing Cost Pressures of Big Data

Nextpoint, Inc.
Contact

Cost pressures from big data in ediscovery have made this often-overlooked process essential in modern litigation.

Legal data volumes have exploded in our modern digital age – which can translate to huge cost increases if not handled properly. It’s more important than ever for legal teams to make strategic decisions at the outset of a case to minimize costs and stay on top of the mountain of potential evidence.

Early Data Assessment (EDA) is a key process that can reduce these massive data loads and give legal teams valuable insights to inform their strategy from the beginning of a case. Earlier this week, Nextpoint launched their new Data Mining software for EDA and ECA (Early Case Assessment). Tools like Data Mining have become essential for modern legal teams looking to embrace EDA and curb the cost pressures brought on by huge quantities of data.

Strong EDA software offers tools for both data reduction and data analysis. Here are some techniques and strategies that help parties reduce their data burden and better understand their case:

1. Date Range Analysis

One of the quickest ways to exclude large amounts of unnecessary data from review is to establish a time range in which files could possibly be relevant to the matter at hand.

For example, if an ediscovery request seeks all documents relevant to a marketing promotion that ran from October 2020 to February 2021, you could try to reach an agreement with opposing counsel to exclude documents created before 2020 or after February 2021 from discovery.

EDA tools like Data Mining can filter out documents created before or after a specified date range so you don’t waste time processing them later in discovery. They can also identify any chronological gaps that exist in your data.

2. DeNISTing

Logic dictates that only files created or modified by users are potentially relevant to a review. But computers also contain lots of common system files that users never touch.

The process of filtering out those non-evidentiary files is called “deNISTing” – a reference to the National Institute of Standards and Technology, which maintains a master list of all known system files. DeNIST filters work by cross-referencing the list and eliminating the matches during ediscovery data processing.

3. Deduplication

If identical copies of a document exist in a collection, it doesn’t make sense to waste time reviewing all of them. Deduplication filters make note of these copies and identify them for exclusion during ediscovery data processing.

This is especially useful when reviewing emails. Each sent email will typically create two copies of itself—one in the sender’s sent-items folder and another in the recipient’s inbox. This phenomenon is multiplied when there are many recipients or file attachments for the original email. Deduplication ensures only the original, master copies are subject to review.

4. Email Threading

This technique uses contextual information embedded in the metadata of email files to re-organize them into conversation threads. That means reviewers can see the chronological progression of a conversation, which can make a big difference in the speed and accuracy of an email review.

Imagine, for example, that your reviewers identify a “hot” keyword in a question posed in the first email in a conversation between three participants. Later in the thread, the second and third participants reply, “Yes,” and “Can you elaborate?”

Reviewed separately without the context provided by email threading, those second and third messages would seem insignificant. But if a reviewer read them in order, she would immediately know to flag them as responsive. In fact, the whole thread could be flagged for closer examination.

Threading can also enable teams to review less emails overall by focusing on the last email in a thread, which often contains a record of the entire conversation.

5. Custodial Analysis

A custodian is the “owner” of a particular piece of ESI – for example, the custodian of an email is the owner of the mailbox from which it was collected.

While the author of an email is the sender, the custodian of that email is the person with access to the mailbox file that contains the message – and they’re not always the same.

A custodial analysis can determine if a custodian possesses data from any of the important people in a case. It can also determine if multiple custodians are in possession of duplicate documents, preventing unnecessary over-collection and undue data processing.

6. Search Term Filtering

Just as in review, data can be pared down to some degree by keyword searches during Early Data Assessment. One key advantage of doing so is minimizing the per-GB processing fees charged by most ediscovery providers.

Let’s say you’re looking into financial documents involving three people. You also know that one of those people frequently emailed gigantic spreadsheets about fantasy football with one of the mailbox custodians, which are clearly irrelevant to the discovery request. To avoid the cost of bringing these files into discovery, you could create search filters for any emails with a subject including “ESPN Fantasy Football” or sender domain “espn.com” and exclude them from review.

7. Selective Sets

Finally, advanced EDA tools often include “selective set” functionality, which enables data experts to filter data using clever combinations of all the techniques above. In Data Mining, these sets are called “slices,” combining various criteria like search terms, custodian or date range.

What makes selective sets truly powerful is the ability to create a selective set of other selective sets – a process sometimes referred to as “stacking.”

For example, you may have two sets with conditions like this:

  1. Emails sent by or sent to Employee F or Employee G with XLS, XLSX, or CSV attachments

  2. Emails kept by Custodian D before June 1, 2019 with PDF, PSD, TIFF, DICOM, XCF, PPT, or PPTX file extensions

With the right software tools, a data analyst can search for documents that match Set 1 but do not match Set 2, or documents that match both Set 1 and 2.

In summary, using a broad range of EDA techniques and selective sets can help you greatly minimize your data burden and avoid a never-ending ediscovery horror story. In our era of digital evidence, a little data science know-how can save a whole lot of money.

Written by:

Nextpoint, Inc.
Contact
more
less

PUBLISH YOUR CONTENT ON JD SUPRA NOW

  • Increased visibility
  • Actionable analytics
  • Ongoing guidance

Nextpoint, Inc. on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide