Remember the original internet, with its clunky graphics and minimal information? You might not remember doing much online back in the 1990s, because frankly there wasn’t yet a lot of worthwhile content to engage with.
But 30 years later, it’s a whole different story. Now, the internet is integral to the daily life of people around the world. We stay in touch with our friends and family online, buy products from suppliers we’ve never met, chat with marriage counselors who live thousands of miles away, keep up with the news, and dig up mountains of information about companies, services, and celebrities. Every minute, consumers spend about $750,000 online. Facebook now has over 2 billion monthly active users, with U.S. users spending about 40 minutes a day engaging with their accounts.
Today, the web is a rich, varied, dynamic data source—but ediscovery hasn’t caught up to its evolution quite yet. If you’re not getting discoverable data from online sources, you might be missing out. Let’s take a look at how we got here.
Time Machine: Remembering the Original Internet
The internet has become so central to our existence that it’s a bit hard to remember how it used to be.
While the technology underlying the internet was developed in the 1980s, the internet truly reached the masses around 1990, when Tim Berners-Lee launched the World Wide Web. In 1995, only 0.4 percent of the world’s population had internet access. According to the Pew Research Center, 14 percent of U.S. adults had internet access in 1995, but most of those “were using slow, dial-up modem connections.” (If you don’t remember the headaches of dial-up, be grateful.) At that time, “42 percent of U.S. adults had never heard of the internet and an additional 21 percent were vague on the concept.”
Google—without which some of us apparently can’t function today—didn’t even exist as a company until 1998. Its website then was known as “Google!” and was offered only in beta format. Facebook didn’t launch until 2004, when it was called “TheFacebook.com,” and although it grew to 1 million users that first year, it played second fiddle to MySpace, with 5 million. Many businesses barely had websites at all; what they did have acted as a “yellow pages” listing so that people could call the business or visit it in person.
Obviously, things have changed.
Today, Life Happens Online
Around 55 percent of the world’s entire population now has access to the internet. Pew reports that one third of U.S. cell phone owners primarily access the internet through their cell phones—which 90 percent of U.S. adults now have. In fact, by 2014, “87 percent of U.S. adults sa[id] they use the internet, at least occasionally—the highest percentage captured in a Pew Research Center poll” since it started measuring internet access in 1995.
These days, business happens on the web. Companies market themselves, communicate with customers, and build their brands and audiences primarily online. People are no different: we live much of our lives on the web—or at least we show the world the lives we wish we were living.
(Photo credit: Lucas Jackson, Reuters)
We get a significant portion of our knowledge of news and current events from social media platforms; Pew reports that 68 percent of U.S. adults learn about the news on social media. In fact, the President’s tweets are considered official statements, subject to presidential record-keeping requirements.
What that means, though, is that businesses need to incorporate all of that web-based discoverable data as part of their strategic ediscovery plans.
Ediscovery Should Happen Online Too
It’s time to acknowledge that there’s a world of discoverable data online—on company websites, in YouTube videos, on blogs, in social media channels, in chat rooms and Twitter feeds and cloud-hosted collaboration channels. It’s time to learn how to identify that data, capture it, and feed it into the increasingly sophisticated ediscovery management and review tools that we’ve already created.
Twenty years ago, we didn’t have ediscovery; everything was done on paper. Even information that was stored electronically would be printed out so it could be handled with the rest of the paper stream. Lawyers would simply agree with one another to leave email off the table during discovery. No one knew how to manage it or what it might tell them, so everyone pretended it didn’t exist.
Wait—does that sound familiar? That’s precisely what companies are doing today with web-based evidence. When it comes to ediscovery, many businesses treat the internet like it’s still 1999. They don’t know what might be online or how they would deal with that information if they found it. As a result, they tell themselves that nothing terribly important is happening online, allowing them to justify ignoring online data preservation and collection.
It was just over 12 years ago that the Federal Rules of Civil Procedure first acknowledged the importance of electronically stored information (ESI), launching the field of ediscovery. In the dozen years since, we’ve undergone a massive sea change: ediscovery has grown from nonexistent to a massive $10 billion industry.
And with the rate of technological change constantly accelerating, the changes over the next 10 years will likely be even more surprising and disruptive. Try to imagine ediscovery today without email. Odds are, that’s why ediscovery without the web is going to feel like—and soon.
Will you be ready?
A foundational question indeed. Capturing context is at the heart of Hanzo's mission. The cofounders were tasked with recording a contextual experience for British citizens to look back on and analyze the web. What they found was there was one persistant problem when they were thinking about this challenge; every off the shelf tool and utility at that time rendered a static flat PDF, or extracted text into some kind of garbled and unreadable form, removing almost all of the dynamic nature of the web.
The founders realized two important things at the time:
1. First, the web was not becoming more, simple. In fact, it was growing more complex by the day. The mission required a web-crawling solution that captured everything: metadata, javascript, dynamic text, videos, to playback in native format so that future generations had context.
2. And secondly the founders had the foresight to realize that collecting native format web data was the only true way to provide the most compelling experience and the best way to demonstrate the defensibility of the web data - raising evidentiary standards- and the quality insights and analytics on that collected content to support proactive eDiscovery.
Here’s What You Need to Make It Happen
For starters, you should be ready to preserve and collect your organization’s own website, blog, social media platforms, online messages, and any other business communication or potentially relevant information, wherever it might appear. But even that wide net won’t capture everything that might be useful. How do you identify additional evidence when faced with an investigation or a potential litigation matter?
Obviously, no one can read everything that’s on the internet—globally, we’re creating more than 2.5 quintillion bytes of data per day. But the growth in technology that got us into this morass of data is exactly what can get us back out. With artificial intelligence (AI), we’ve built investigation tools that can scour the web for potentially relevant evidence.
Identification of evidence is only the first step in ediscovery, though. From there, you need a method for website capture that can collect evidence in its original, dynamic format, analyze and review it, and ultimately present it to a factfinder in the most compelling format. That’s why we use native-format WARC (Web ARChive) files to create our archives: they preserve and play back every aspect of the interactive elements and dynamic components found on modern websites, including links, videos, dropdown menus, and more. And our archives are compatible with major review platforms like Relativity, so you can incorporate online evidence into your existing ediscovery workflow.
In just 30 years, the internet has grown from practically nothing to a dominant force in the daily lives of people—and businesses—everywhere. (How many 30-year-olds can say the same?) It’s time for ediscovery to embrace the internet as the essential data source that it is, and start identifying, preserving, and capturing dynamic online content and incorporating it into familiar workflows and review platforms.
After all, we’re not in the Stone Age of the internet anymore; you should neither ignore online evidence nor limit yourself to a Flintstones-equivalent web capture method.