Preserving Collaborative Content: 6 Common Challenges Collecting Data From Google Drive

Hanzo
Contact

Hanzo

At Hanzo, we talk a lot about the collaborative content companies generate in communication apps like Slack. However, instant messaging isn’t the only way collaboration happens, and Slack isn’t the only place where team members work together. It’s likely your team also collaborates on documents and spreadsheets through platforms like Google Workspace (formerly known as G Suite).

As with messages in Slack, the content within Google Workspace may be subject to litigation hold obligations if it’s relevant to any pending or anticipated legal matters. And—again, as with Slack—the built-in tools within Google may not be up to the challenge of corporate ediscovery.

Let’s take a closer look at the ediscovery challenges posed by modern collaboration tools like Google Workspace.

SITUATION

Before we begin, if there’s no potentially discoverable information in Google Workspace, then there’s no obligation to preserve that information, and there’s no need to talk about it. Suppose your team never uses Google Workspace to store content related to customer service, product design or liability, intellectual property, marketing, or internal operations. In that case, you can probably stop reading this and carry on not worrying about how to preserve content from Google Docs or Google Sheets.

But what if you store contracts in Google Drive? What if you draft and revise product spec sheets in Google Docs where your teams insert edits or comments as issues arise and are resolved? What if you use Google Drive to store a running document where you manage your customer service tickets?

Organizations that use Google Workspace in business know that any of that content could be responsive to litigation or needed for an internal investigation. As part of Federal Rule of Civil Procedure (FRCP) 37(e), a party can be penalized if it fails to preserve electronically stored information (ESI) when it reasonably anticipates litigation. To avoid these sanctions and meet their preservation requirements, corporations must plan how to defensibly, responsibly, and proactively manage their information.

To add a layer of complication, as we’ve already hinted at, there’s more to preserving Google Workspace content than just being able to access the current version of a document or spreadsheet. You may need to preserve and review the specific version that existed when someone linked to that document from Slack or when a designated team member made changes to the document. You may need to preserve, collect, and review content from one person or one particular week.

Google Vault allows the retention and preservation of the content that an organization has within Google Workspace. As with most inherent tools like this, however, it wasn’t tailor-made for corporate ediscovery; rather, it’s geared more toward occasional export or compliance needs. In many cases though, a tool that was built to be “good enough” turns out to be anything but. In this case, it falls far short of delivering the information that organizations need for information governance and ediscovery.

Here are six common challenges with preserving, collecting, and reviewing potentially discoverable information from Google Workspace.

CHALLENGES

1) The sheer volume of data on Google Drive.

Imagine your organization only had five users on Google Drive and each of those users only had five documents. In that case, you might not need a specialized solution to identify, preserve, and collect discoverable information from those documents. It’s not too hard to manually search through 25 documents, right? This math could get out of hand quickly, though, if your team updates each of those 25 documents every day. After four weeks of daily updating, you wouldn’t have just 25 documents—you’d have 20 distinct versions of each, for a total of 500 document versions. And that’s if your teams were only updating each document once per day, instead of multiple times per day, as is often the case with collaborative content. In reality, most organizations have hundreds of documents—and an overwhelming volume of data—in Google Workspace.

2) Insufficiently fine control over preservation and collection.

Ediscovery professionals know that relentless culling of extraneous data is the key to affordable and manageable ediscovery. But Google Vault doesn’t give the user file-by-file control over exports. So, while you can use Google Vault to export content from Google Workspace, it may not be the best approach. Whereas you may only need to export one folder or one day’s worth of content from one user’s Drive, with Google Vault, you’ll need to export the entire Drive. Not only is that inefficient in terms of time, but it results in downloading and preserving a tremendous excess volume of data. Multiply that by several users in an average ediscovery matter or internal investigation, and you’ll quickly wish you had more granular control over which files you preserved and which you collected.

3) Lack of familiarity with data structure and navigation.

Okay, so you can’t limit your export all that well, but at least you can navigate through Google Drive to find the content you need, right? Well, maybe. The problem here is that the individual user—the person who set up their Google Drive folders and data structure—isn’t usually the person who’s exporting that data. Instead, you probably have someone from the IT department or someone in compliance or legal setting up your data exports. Whoever it is, it’s unlikely that they’re familiar with where specific files are located, which makes navigating Drive a challenge.

4) An export format that isn’t geared toward ediscovery.

Formats matter. Just ask any ediscovery expert who has had to painstakingly load data from one platform into a review tool that requires a different format. Valuable time and effort is often spent simply making exported data usable in the review tool. When it comes to loose file exports from Google Drive, Google Vault exports may be less than desirable due to their format. In short, Google Vault exports provide metadata via XML files, while the loose documents themselves are exported with both the file name and the internal Google Doc ID. For a review platform to understand these separate files, the user must reassemble them. This drastically increases the amount of time and effort needed to prepare data for review.

5) Metadata loss at export.

File metadata is a critical search component when trying to identify which data individual custodians have interacted with. An unfortunate downside to Google Vault, as alluded to above, is its disregard of exported file metadata. In addition to separating metadata from exported files, Google Vault also omits critical metadata information, either not delivering it or altering document creation dates upon export. This is bad news for the ediscovery process, where metadata plays an important role in both early case assessment as well as review and production.

6) Difficulty identifying the relevant version of a given file.

We touched on the challenges of versioning first, when discussing the overwhelming volume of data that can be contained within a single user’s Google Drive account. But there’s another problem with versioning: how do you know that you’re getting the right version of a document, edited by the right person on the right date? That relies on either inside knowledge about the document’s history—which, again, an IT tech is unlikely to have—or access to metadata that a Google Vault export will modify or omit. Ediscovery professionals must have not only a document’s version history but also, and more importantly, a list of the viewing and editing history of individual custodians, in order to scope their discovery efforts during early case assessment. Google Vault does not provide access to this crucial information.

The solution to all of these problems is a purpose-built ediscovery and investigation tool for Google Workspace.

[View source.]

Written by:

Hanzo
Contact
more
less

PUBLISH YOUR CONTENT ON JD SUPRA NOW

  • Increased visibility
  • Actionable analytics
  • Ongoing guidance

Hanzo on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide