It’s that time of year again when people are pulling out their crystal balls and doing their best to predict the future of what we’ll see in the ediscovery industry in 2023. I mean who doesn’t want to know what to look out for down the road? So in that spirit, here are some things Hanzo has been paying attention to as we move into a new year.
Data Sources Outside the Norm
New data sources should probably make the list every year, but the shift that is taking place now is slightly different than we’ve seen in the past. New messaging platforms like Slack and Teams have been the talk of the town when it comes to new data over the past few years, and while there are still many ediscovery challenges around that type of data, they are no longer the new kids on the block and have become a fairly regular part of the ediscovery workflow.
What has emerged as a new challenge is collecting data from other SaaS collaboration applications like Jira, Confluence, Asana, and Salesforce. Reports have shown that an average enterprise might be using up to 200 different SaaS applications at one time. And with a 60% yearly churn rate, the number of applications that are being adopted and then replaced by something new creates challenges when it comes to understanding your organization’s data landscape.
But more than that is the challenge of collecting data and metadata from these sources. There is often an export function of some sort, but it doesn’t capture the interface which itself often holds data and context to understanding data. Screenshots have been used to supplement these raw data exports, but they don’t always effectively fit the rules for best evidence.
So as we see more data from these sources requested for discovery, legal teams will need to find solutions to easily collect the data and metadata from the ever-changing SaaS datascape.
Higher Data Complexity with Large Enterprises
When collaboration messaging apps like Slack and Teams first fell on the radar of legal teams, the first order of business was simply collecting the data in a way that met evidentiary requirements. As I mentioned above, this task is becoming increasingly part of the standard operating procedure with litigation teams. However, industry studies are beginning to show that when it comes to these data sources, it’s not a one-size-fits-all situation.
Organizations with only a few legal matters a year aren’t going to face the same challenges as large multinational enterprises. So a solution that works for a smaller company regarding the discovery of their messaging data won’t be enough to handle the challenges of a much larger entity with multiple ongoing matters.
A great example of the complexity of a large messaging environment came from a recent webinar Hanzo conducted with ACEDS when Larry Briggi of IBM stated that he has a single custodian connected to over 8000 unique Slack channels [with each channel comprised of countless messages], and in a single matter regarding 26 custodians there were 86 million messages connected to those 26 custodians.
With a dataset this large and this complex, a simple collect-everything approach will not suffice when it comes to finding relevant ESI.
Continued Development Of AI
Artificial intelligence makes lists like these every year, so why should this one be no exception? For this list, I’m going to focus on how AI can help with some of the problems cited above.
The nature of messaging apps vs. email creates some unique challenges when it comes to scoping matters, locating relevant ESI, and culling data down to usable sizes. One of the most interesting things to me is the relationship between data export sizes and the number of messages.
In one example from Hanzo’s data science department, a Slack export of only 725 MB (that’s right, megabytes) of plain text yielded over 18 million messages. So on the surface, the data export is very small compared to an average email export for discovery, but the number of messages contained within that export is huge. One might think an export falling under a single gigabyte would be fine for manual culling and review, only to find 18 million messages to be sifted through.
If an organization only had a few matters per year, this still might be tenable, but for a large corporate enterprise with large messaging instances, additional help will be required to parse through data, map channels and custodians, detect and locate PII and other sensitive information, and conduct keyword and entity searches on large sets of unstructured chat conversations that don’t have the benefit of standard organizing fields (To, From, Subject, Date) that all emails carry. So as messaging continues to become the standard in enterprise communications, more advanced tools are going to be needed to comb through the mess.
Conclusion
Of course, this list of trends is far from conclusive, so don’t mistake these as the only important things to look at in the coming year.
[View source.]