[authors: John Brewer, David Cohen*, Maura Grossman**, Mary Mack*** and Kaylee Walstad****]
Editor’s Note: There’s much conversation around how our industry should best use generative artificial intelligence (gen AI) to do our jobs. On March 19, 2024, HaystackID sponsored an EDRM virtual workshop that explored how gen AI can transform document review and other legal processes. Expert panelists shared how they use this technology in their daily workflow.
While the panelists had unique takes on what gen AI means for our industry, they all highlighted the need for empirical evidence to support its effectiveness and discussed why eDiscovery professionals must understand these tools’ security and privacy implications. During the workshop, both panelists and attendees explored how these tools’ costs could be prohibitive for large cases and why AI requires human input and verification.
Access the virtual workshop’s on-demand version and follow along as industry leaders discuss this technology‘s limitations, practical use cases, and untapped potential.
Expert Panelists
+ Mary Bennett
Senior Director, Content and Community Initiative, EDRM
Director, Content Marketing, HaystackID
+ John Brewer
Chief Artificial Intelligence Officer and Chief Data Scientist, HaystackID
+ David R. Cohen
Partner, Reed Smith, LLP
+ Maura R. Grossman
Principal, Maura Grossman Law
Research Professor, School of Computer Science, University of Waterloo
+ Mary Mack
CEO, Chief Legal Technologist, EDRM
+ Kaylee Walstad
Chief Strategy Officer, EDRM
Transcript
Moderator – Mary Bennett
Welcome to EDRM’s Workshop where we’re going to be exploring all things gen AI. I’m Mary. I’m today’s moderator, and I’m excited for a few reasons. One, we have a truly, truly excellent panel of experts on today’s call. They’ll be sharing high-level thoughts on gen AI, but we’ll also be getting into practical tips that hopefully you can implement in your day to day. Some of you might know me from my Relativity and Stellar Women days. I have since gone through a rebrand. I am Mary Bennett now. I got married, that’s why I rebranded. So, I’m also British. If that accent comes through, I’m just trying it on for size. So that is that, and thrilled to be moderating today and working with two of my friends and mentors, Kaylee and Mary. It has been just wonderful to work with them as I re-enter the world of legal tech and eDiscovery.
Some other quick news about me, I recently joined HaystackID, and that is a sponsor of today’s program. What we’re doing with HaystackID is we’re solving complex data challenges related to legal, compliance, regulatory, and cyber events. As the Director of Content, I’m thrilled to work under Rob Robinson, Erin Meyers, and many other fabulous team members, especially as I come back into the industry. A few housekeeping notes for how today will go before we rock and roll. We initially promoted this as having breakout rooms. We want it to be collaborative and have fluid discussion, but there are a lot of folks that signed up. In the interest of time and managing this program effectively, we’re not going to be doing breakout rooms here today. But we do want it to be collaborative, so you’ll see that chat feature on your screen.
We invite you to share your thoughts and comments throughout. Mary, Kaylee, a handful of others, and I will be monitoring that. If you ever feel like you want to come off mute to say your question or share your experience with gen AI, we kindly ask that you raise your hand using that emoji at the bottom of your screen, and we can unmute you. We will be recording today’s program. So just a note if you’re going off mute, we are recording, and everybody who attended and registered will get a copy of this recording in the days following the event. I’m going to turn it over to Ms. Mary Mack to give a quick thought on the Code of Conduct.
Mary Mack
Thanks, Mary. Kaylee and I are just so delighted to have all of you here. AI is the issue. It’s the shift of our times. The stakes couldn’t be higher for those of us in the legal and the document and production community to make a difference. AI, like TAR before, has passionate advocates for platforms, approaches, and validation. As we discuss, as we do with all our EDRM events, we encourage respectful communication according to our event code of conduct.
And it’s not a webinar today; it is a workshop designed for interaction. Our thanks to Mary Rechtoris Bennett for being our moderator and HaystackID for being the sponsor today as well. We have participants from 21 countries from Israel to Australia with our full signature multidisciplinary and full ecosystem participation. We thank our friends from across the globe who interrupted their sleep to be here live. Those of you gathered today will have varying degrees of comfort and facility with the AI. And we just want to remind no matter what level of expertise, everybody here has something to offer, and everybody here has something to learn. And with that, back to you, Mary.
Mary Bennett
Thank you, Mary. We will be planning a workshop for Q2, so just stay posted on social and email, and we’ll announce that. And that’s enough for me. I am sick of hearing my voice. So, I’m going to kick it over to our panel today. If you would all say your name, role, and a softball question just to get us started, what’s your most unique or controversial take on AI? And Maura, why don’t you kick us off here today?
Maura Grossman
I’m a professor in the School of Computer Science at the University of Waterloo, an Adjunct Professor of Law at Osgoode Hall Law School, and Principal at Maura Grossman Law. Among the panelists, I don’t want to say naysayer, but I’m holding back. I may be more [reserved] than most people. I want more benchmarking and I want more empirical evidence before I’m going to run first into this pot.
Mary Bennett
Thank you. Always good to have diverse points of view. Why don’t we kick it over to you, Dave?
Dave Cohen
Hi everybody, I’m Dave Cohen. I head up the Records & E-Discovery Practice Group at Reed Smith, which is a big group of around 70 lawyers strong in our global law firm. And I also have the honor of chairing the EDRM Board of Project Trustees. I see Mary and Kaylee nodding their head. I remembered to say that. I’m going to be the counterpoint for Maura here because I think that AI has a tremendous possibility, and it’s just in its infancy. We haven’t seen anything yet, but millions and billions of dollars are being invested. I expect very fast strides and being an optimist, I’m hopeful that the benefits of AI are going to outweigh some of the detriments. When new technology comes along, it tends to be a lot of focus on the detriments. We certainly have to watch out for those and prepare for them as well.
Mary Bennett
Thank you, Dave. John, will you please go next?
John Brewer
Absolutely. I’m John Brewer. I’m the Chief Artificial Intelligence officer at HaystackID. I come from the technology side. I’ve been in the data and AI space since about the late ’90s. And in terms of my views on gen AI, probably the view that I have ends up somewhere between Dave and Maura. I’m excited about generative AI. I think it’s a new tool in our toolbox. It alone is not the final breakthrough that we require for generative AI and anything like that. It has interesting capabilities that nothing else that we’ve already had can match. And that when we start plugging that together with some of the tools that we’ve had for 10, 20, in some cases 50 years, we’re going to see some really creative people doing some really exciting things with it over the next two years.
Mary Bennett
Great. Thank you. John. Mary Mack.
Mary Mack
Mary Mack, CEO and Chief Legal Technologist here at EDRM. I guess it’s not a controversial take. I think this is a revolutionary technology, and it’s a revolution every month almost keeping up with it. Those of us in the legal community are word wranglers, so we’re talking ChatGPT and those kinds of things. Those are words we need to know about it.
Mary Bennett
Thank you, Mary. And last but surely not least, Kaylee.
Kaylee Walstad
Kaylee Walstad, Chief Strategy Officer at EDRM. Hello everyone. Thank you so much for joining us today. I think that the generative AI and large language models are amazing. I use them many, many times a day for a multitude of various things, but I tell everybody to trust but verify. I feel you need to be cautious of people who say whatever tool they’ve built or whatever AI they’ve baked in doesn’t hallucinate. I feel like my controversial thing is it’s in its infancy. It hasn’t even learned to walk yet. And we need to embrace it. It’s here, it’s not going away, and learn how to use it. It’s a great kick-starter for ideas. I do a lot of images with it, but it can be scarily wrong and sound aggressively right. So that is my take. And Mary Bennett, I’m wondering if we should one more time for those who joined a few minutes late, let them know about the raised hand and coming off mute.
Mary Bennett
Yes, for those that just came in, we want today to be fluid and collaborative. Please share your thoughts in the chats, but if you want to share live, please raise your hand using the emoji at the bottom of your screen, and we will unmute you, so you have the floor. Let’s just do a little bit of definition setting. We all are well-versed in gen AI, and I imagine many on this call are as well. But I want to just level set so we’re all speaking the same language for the next odd 50 minutes or so. So why don’t we go to you, John, first, can you just define what you view as gen AI and how that may differ from other AI tools on the market?
John Brewer
Sure. So what generative AI is it’s essentially an AI system that takes a lot of previous input examples, what we call the corpus or the training set of data, and uses that to predict what the next component of some piece of content is. With an LLM, it’s a word with a diffusion model that creates pictures. It’s greater detail in a section of pictures and things like that. The particular generation of gen AI that we have now as opposed to some of the older technologies that did similar things is largely based on a technology called transformers and a concept called attention in artificial intelligence that I won’t go into the details about. That was the piece of research that bumped us from our previous models in the 2017-2018 timeframe into this GPT world that we’re now living in that’s powering this revolution. Not just in large language models and text-based processing, but as we’ve been alluding to all manner of content.
Mary Bennett
Thank you, John. I invite any of the other panelists to share whether they have a differing opinion or have a different view on it: Mary, Kaylee, Maura, or Dave.
Maura Grossman
Sure. I would say it involves a combination of machine learning and natural language processing, specifically deep learning and reinforcement learning on the machine learning side. And as John said, it’s trained on massive, massive data sources, primarily the internet and it can generate new content in response to a prompt. The interesting thing about it is that it’s multimedia, so you can go from text to text, text to image, text to video, text to audio. It’s important to make the distinction between generative and discriminative AI. We’ve been using discriminative AI, supervised machine learning, and TAR to classify and predict for a long time. But generative AI is different because it generates new content, and that’s in part why it hallucinates; it’s actually a feature, not a bug that it’s supposed to be creative and generative.
Mary Bennett
Thank you very much, Maura. What about you Ms. Mary Mack?
Mary Mack
Well, I too am amazed at how confident the AI is in its answers. It should know how confident it is I think and introduce some maybe doubt in their answers and sometimes they do that. I just asked it this morning about a legal concept, and it hemmed and hawed about what it was doing with the law. It’s not really practicing law and you must look further and things like that, but I’ve also asked the questions and it just is so confident.
Mary Bennett
Just like many of us, its confidence ebbs and flows. That’s a great point. Thank you, Mary. Kaylee, any thoughts?
Kaylee Walstad
I shared my thoughts earlier in my long-winded hello, I’m Kaylee Walstad.
Mary Bennett
Great. Then Dave, last but surely not least, any thoughts on adding to this definition?
Dave Cohen
Everyone has done it justice. I think what we need to adjust our thinking about is when we have a dialogue with it. [Gen AI] sounds like a human and comes back with answers that sound human, but it’s always important to remember that it didn’t get there by thinking like a human. And so that’s why we sometimes get fooled by these hallucinations. It’s just predicting what words should come next as it goes along based on prior words. It’s almost magic that it can end up creating these sensible-sounding dialogues or pictures in response to prompts because when you hear how it works, it doesn’t make sense. It should be able to do all this, but it is amazing. And the first generation, which was really what we’re seeing as Kaylee said, this is in its infancy, ChatGPT was released at the end of November 2022.
The hallucination problem was one of the biggest problems. Now with a lot of the development that’s going on, a lot of attention is being placed on cutting that down by otherwise filtering the answers, not just going straight to these large language models to predict, but also then filtering the answers based on smaller groups of documents and data that better refine the answers that come out.
We are seeing two kinds of improvements that will help eliminate hallucinations. One is that sort of fine-tuning for specific purposes like legal purposes. The other is the ability to cite source material. For example, in the old days, you could point it toward a whole lot of documents and say, give me a summary of these documents, or tell me what Jane Doe was talking to John Smith about between 2021 and 2022, and it would give you that summary, and it’d be a pretty good summary.
But now, even better tools will not only give you that summary but say, “Here are the reference documents in the population that I’m basing that summary on.” You can actually trust but verify as Kaylee says. And that’s overcoming one of the biggest issues. The other biggest issue I’ve seen is you’ve got to be careful with what you feed into gen AI because of confidentiality concerns. And I think we’ll get into a little bit of that later, but especially the free versions or the public versions, you’re giving away your data. You’re potentially waiving privilege and giving away secrets, so you have to be very careful what you feed in unless you are subscribing to a gen AI product that has that confidentiality protection built in.
Mary Bennett
Great point. And we definitely will be diving more into security and all the limitations and potential of this as we go. As I noted at the beginning, we do want to have this be engaging for everyone here. We will be launching a quick poll to just gauge everybody’s gen AI proficiency. So would ask for some participation here.
How would you describe your gen AI proficiency?
- You’re curious but haven’t yet applied it in your day-to-day.
- You use it a little bit here and there.
- You’re a pro and could be on this panel.
- Other
Great. We will give it about a little more than 30 seconds just sitting in silence. I will try and be comfortable with it, but please take a moment that will just help us as well as guide our conversation and the panelist responses.
Kaylee Walstad
While we wait for people to take the poll, building on what everybody has been saying, I would just want to remind everybody that this generative AI is not a robot. It’s a tool and it needs humans to prompt it, to use it, to verify it. So, it’s important to understand and embrace it both the good, the bad, and the ugly. Would you guys agree?
John Brewer
Absolutely. I think human-in-the-loop, generative AI is the only way that this technology currently works, and I’m not convinced that in the near term, at least that’s going to change. I defer to my fellow panelists if they have other opinions on that matter.
Dave Cohen
Yes, it depends on what you call the near term. So yeah, in the next few years, humans need to be in the loop.
Mary Bennett
All right. Well, give or take a minute and 30 seconds, now I’m going to end the poll.
We’re about:
- 41% say they’re curious.
- 46% say they’re somewhat proficient.
- We have 10% of pros here, so love to see that.
- And 2% said other.
If anyone feels as if they want to elaborate on what other means, feel free to type it in the chat. No pressure though. So, Kaylee and Mary, you speak a lot with the e-discovery community through all your EDRM programming and initiatives. Where does this stack up with what you’re hearing at an industry level?
Mary Mack
Go ahead, Kaylee.
Kaylee Walstad
Great question. I would say that there are so many stats, so I would say it’s from our perspective or my perspective, yes, it is in line. I think there’s a ton of people that haven’t even yet embraced it. I talked to somebody this morning who reached out and said their company is adding an AI component to summarize depositions, and, is that something that they can trust to just say, “Summarize this deposition, feed the deposition in”?
I said, first, you must understand it, and once that data is fed in, do they keep the data? Where’s the data stored? Is it not stored? I went through a whole litany of things, and I think there are a lot of people out there like who don’t understand and think this is great. This is going to give you a summary, but the summary needs to be verified because I said you don’t even know if this new tool’s summary could be full of hallucinations and incorrect information. I feel like there are more people than the poll results show that are there more. Maura, and Dave, what do you guys think?
Dave Cohen
Yeah, I would say that we have a nice, self-selected audience here of people that are interested in gen AI. I think the fact that we have over half the people on this call who are either somewhat proficient or gen AI pro probably greatly exceeds the general population. I’d be surprised if more than 10% of all the lawyers in the US are using gen AI now. This is a smart self-selected sample, but I’ve likewise done polls like this for the last year and a half as I’ve been doing presentations on gen AI, and that number has gone way up. A year ago if we asked the same crowd, I bet you’d get more like 10 or 20% that had used it or was using it instead of the 56% plus that are now using it. So that’s great strides. And when we ask this question again a year from now, we’re going to be up to 85 or 90% at least with a self-selected audience that comes to a presentation like this.
Maura Grossman
I also think back to Kaylee’s point that safety depends on the use case. So, if I provide ChatGPT or another large language model a document and I say, “Please summarize this document,” it is likely to be able to do that task better than if I say, “Without feeding any document, please summarize the Illinois tax code.” Then we don’t know whether it chooses Illinois or Delaware or the IRS or what tax code it’s going to choose and how it’s going to summarize it. I think some use cases are safer than others. I would still agree you need to double-check and do a sanity check, but some things are more or less dangerous, and that’s important to know.
Mary Bennett
I do want to read what Matthew shared in the chat. I think it’s a great point to hit. He said part of the headache we have with eDiscovery is the ever-increasing volumes of data that we have to transverse. He says he doesn’t see transformer models practically having the capability to handle this volume due to overheads and crunching the data. And he wonders whether we will see tokenization and embeddings being baked into eDiscovery. I’m not sure if you all see that comment and have any thoughts on what Matthew shared in terms of the sheer volume of data.
Dave Cohen
Definitely true. Right now, all the major eDiscovery hosting engines are putting in these generative AI capabilities, different functionalities, and different pricing models like Relativity aiR, which is going to pre-populate whether the document is relevant or not relevant. These capabilities are not going to replace reviewers but to speed up reviewers. Reviewers already have that when they look at the documents to reveal where you can ask your database a lot of questions about the data. DISCO Cecilia is already out there and has that functionality and the ability to create timelines. Everlaw Discovery Assistant is out there. But if you’re talking about the biggest eDiscovery task, which is reviewing large volumes of documents for responsiveness and privilege, none of these systems yet are a replacement for human review, and it’s not clear that they will be a replacement even for TAR anytime quickly, partially because of the pricing models.
It’s expensive to put a lot of documents through generative pre-trained transformers, especially if you must pay OpenText for tokens to put them through ChatGPT. So, right now, with the pricing models that we’re seeing and hearing about, I don’t expect this to replace predictive coding until some things change in terms of the way the workflows and the pricing models. It may partially replace some or smaller document cases, but for large volumes of documents, it would be pretty expensive right now to use some of the new features, at least with pricing that’s being announced or rumored at this point.
Maura Grossman
I would agree with that. I think you must remember that these tools have what are called context windows, so there’s only so much you can put in at a time. With many cases with long documents or long spreadsheets, the volume would just be prohibitive in terms of costs. When we think about whether we want to use these tools, not only do we have to look at performance and whether they actually improve performance over. For example, with straight TAR, we also have to look at time and cost. Cost right now is probably prohibitive, as David said, for most large manners to load this stuff would be very, very expensive and more expensive than alternatives.
John Brewer
This isn’t a problem that just the legal field is having with gen AI right now. It’s extremely expensive computationally and financially compared to most other AI capabilities that are out there and historical models that we’ve had in machine learning and NLP and such. That is why I think we’re seeing a lot of the companies that are aligning more to saner price points aren’t taking the approach of running to use the review example gen AI against every single document—what we’ve referred to as the boil the ocean approach to generative AI—but using other tools that we’ve had in our arsenal for years or decades to cut down the amount of data that we’re doing and use gen AI for the things that only gen AI can do. And that, I think, is going to be the first real production generation of tools that we’re seeing certainly, at least until the cost of gen AI comes way, way down, assuming it ever does.
Mary Bennett
Great. As we’ve seen gen AI change almost weekly or daily, let’s take a wider look at the tremendous amount of change over the past year. Maura, what do you think is the single most significant advancement you’ve seen in the last 12 months?
Maura Grossman
I think there are two. One is the improvement in voice clones is really scary to the point that somebody can be on a Zoom call with five other people and not have a clue that four of them other than themselves were fake. That is just staggering when you think about it and the ability for fraud and for faking Biden’s voice. Audio has improved. But if you’ve seen OpenAI’s Sora video, you can’t help but be blown away at the quality of the video with just a text prompt. Now it gets some physics wrong, like the sizes of things or things going backward when they should be going forward, but the quality of those images is just staggering. So for me, it’s the audio and the video that I think just have really moved ahead.
Mary Bennett
Thank you. Dave, what are your thoughts on this?
Dave Cohen
Yeah, it’s scary. I mean, warn your parents and grandparents that when the next time they get that call that their grandkid is in trouble and needs money sent to them, it could be somebody actually mimicking the voice very well, and they don’t need much to capture much audio to do that. This is one of the risks I’ve said. I think the benefits could outweigh the disadvantages of this technology. This is like most technology that’s come before, but it’s certainly that’s something to watch out for. But in terms of the practical uses of AI for legal practice and eDiscovery, I think the biggest development is the fact that people are building these guardrails around just the original ChatGPT. They focus it more with not just on the whole internet large language models, but on legal-specific large language models and use the RAG technology to focus the answers and give legal answers that make sense and that reduce hallucination.
I agree with Kaylee, you still shouldn’t trust it without verifying it. You have to protect the privacy of information. Putting things behind either your firewall or another firewall with adequate security protection made the difference between making these products theoretically useful and letting you start to use them in practice. And I think that’s where we’re going to see a lot of continued improvement over the next year or two.
Mary Bennett
Thank you, Dave. Kaylee, any thoughts from you?
Kaylee Walstad
I agree with everything that everybody said, I’ll reiterate again, it’s a tool, and it’s only going to get better. As I said, in its infancy, and since November 2022 when it came out, we’ve seen it, now we’re at ChatGPT-4 and 5 is going to come out. Each of these iterations is going to be better. But I think some of the questions that you need to ask as you incorporate it with potentially sensitive documents is asking those second- and third-level security questions. And there are questions that I think a lot of people don’t know to ask like Dave was just saying:
- Is there a container or a silo around the data?
- Do you store it?
- Where does the data go once it’s input?
Dave and Maura, what are some questions that you… Let’s start with John. John, what are some questions that you think people should be asking that they may not realize?
John Brewer
Questions about what in particular would you say here?
Kaylee Walstad
The security of the LLM that’s underlying that they’re inputting potentially sensitive data into.
John Brewer
There are two major locations that data that you put into the LLM can flow that you need to be concerned with. One, is this going to end up in model knowledge? Which is kind of the classic, are you going to train off my data? Are you going to use it to improve the information? That’s what stung Samsung last year when their engineers input company secrets into ChatGPT, and it started producing them. That’s the real concern on that side. The second question is a more traditional platform and cloud question that arguably you should be asking about all your providers, which is are the inputs that I’m putting into this being logged? Is that data being kept around for operational purposes, for research purposes, or for any other reasons? There’s this whole side industry right now that is basically crash forensics on AI, and it’s not entering the legal space yet.
It’s still a technical thing, but the systems that are being used are keeping huge amounts of telemetry including every prompt and every response that they’ve ever received or sent out. And that data is just floating around in the environment, and it is retained data. Theoretically, it could be accessible in a discovery process or something else if anybody knows that it’s there. And so, asking reasonably pointed questions not only about, “Hey, are you training with my data, but are you keeping my prompts and responses even if I’m doing RAG, even if I’m providing everything, especially if I’m providing everything? And how long are you keeping that information? What is its life cycle?” I think that those are the two major questions you need to ask any AI service provider that you’re working with these days. I yield my time.
Mary Bennett
Well, I have another question for you, John, so you’re back on. You talked at Legalweek about how AI has changed tremendously over the last year. You said there’s just been a plethora of new tools that have hit the market, and we don’t really need to fine-tune these models because they’re now using pre-trained models. We’ve talked a little about document review, but what’s the “so what” of this evolution, and what does that mean for eDiscovery pros and doc review?
John Brewer
I mean, I think that Maura talked a little bit about this, or at least alluded to it earlier in that last year, like early 2023, we thought that our context window size was going to be around like 4,000 tokens or 8,000 tokens, which is 48 pages of text. In a review context or even a large information processing context, that isn’t a lot of information. A lot of the research then was going into how we train these things, how we fine-tune models, how we add into model knowledge, and then ask questions of that model knowledge, which exponentially increases the probability you’re going to get hallucinations in your responses. It also meant that we were going to be fine-tuning models for individual cases. We were talking about certainly for individual clients, and we were worried about the fact that we were creating these effectively PII objects that need to be managed and destroyed and all that. Suddenly, context windows throughout the summer blew out two orders of magnitude up to hundreds of thousands of tokens, and RAG became the thing.
You don’t ask the model questions from its model knowledge anymore. You give it a bunch of data and say, “Please summarize this.” I think that was the big change that happened, and it not only meant that we could move away from training models with large amounts of knowledge about sensitive information, but it meant that there was a benefit to doing that. Now, we could just run against pre-trained models at scale. That is what we’re seeing in a lot of the products that are going to market now. I think that as we get to the next generation, there’s going to start a shift back to fine-tuning models because you do get better quality results that way. But I think that all the forces in terms of both commercial pressures and research pressures are moving us towards large pre-trained models like your GPT-4s and your Claudes and your Llamas of the world. For the moment that training has taken a backseat, and we’re living in a RAG world at least for the last four or five months.
Dave Cohen
RAG stands for retrieval augmented generation, right? John?
John Brewer
Yes. I’m sorry. I should have defined that term earlier in the talk.
Maura Grossman
The difference between fine-tuning and RAG is for fine-tuning, you take legal documents and trade documents. It’s specifically trained on a particular set or topic of documents. RAG is where you go outside the system to check or confirm content Maybe it goes out to the web and checks that the answer to the question is the correct answer. And actually, the best tools use a combination of both. They fine-tune on a specific domain, but they also go out and find the source externally.
Mary Bennett
We’ve been hitting on this throughout the panel. How do you all do or do not use gen AI? You all sit in different parts of eDiscovery. Maura, you’re a professor. Dave, you work in a law firm. John, you’re from a technology company. Take us through your day-to-day. How do you use this technology? I know every day is different, but just from a typical Monday, maybe. Dave, we’ll kick it off with you. How are you using gen AI in your role?
Dave Cohen
So, personally, I’ve started to use it to generate content for presentations. For example, and I don’t start with it from scratch, but I’ll come up with my outline for presentation and then compare it with what I can get from ChatGPT. I’ll use the DALL-E, the illustration feature, to create PowerPoint slides. We’re very careful not to just pull pictures off the internet because of worrying about violating somebody’s copyright or intellectual property. I suppose arguments will be made that DALL-E will also be violating intellectual property to the extent that it creates new material out of old material. But there haven’t been any legal rulings yet that anything that’s generated by DALL-E actually violates anyone’s copyright. I’d be interested to see where the law goes there. I started to use Microsoft Copilot, which enables you to ask certain things. It’s not terrifically impressive yet, but it’s going to improve a lot.
I use all of those things in my daily life, but where I’m researching it is within our eDiscovery group, and a lot of what they do is large document reviews. We are looking at when and how can we implement it for document reviews. We’ve been using predictive coding for a long time. We are not implementing any of the generative AI yet at scale for large document reviews, although we will be. We’re looking at products. I remember when I first started practicing paralegals summarized depositions, and I summarized depositions. Anybody who’s still doing that, stop, okay, because gen AI can do it just as well, and immediately and much less expensively for clients. There are some features right now where it is ready for prime time. And again, getting back to the major hosting engines, Cecilia from DISCO can already answer questions and give you some very interesting information.
Relativity aiR has been released in beta, and I think it’s slotted for a full release in the third quarter. Reveal Ask is already out there, and beta Everlaw Assistant is out there. They are not yet replacing the biggest, most time-consuming, most expensive task, which is human review, and they’re not replacing privilege review either. So, whether responsiveness or privilege, responsiveness will be easier to replace. The privilege is going to be more difficult, but there is potential in all those areas. But as John said, and as Maura said, because of the expense, you probably aren’t going to apply these until after you’ve used other tools to filter, whether it’s the old keywords that are still around or TAR, and then you apply the generative AI to a smaller subset at the end.
Mary Bennett
Thank you, Dave. Mary Mack, what about you? How are you using gen AI in your day-to-day? You said you used it this morning.
Mary Mack
I did. I asked Google Gemini to explain itself basically as if I was asking it for a manual. And curiously, it gave me this whole answer on how to use Gemini for trading crypto because there’s another product named Gemini. It didn’t even realize I was asking it about itself, so that was interesting. But I do things with WordPress. WordPress is like a programming language. I’m just amazed at what it can do. You can ask it a question; it’ll give you a step-by-step of how to do things. I use it to look at documents for proofreading, formatting, and tone. Those are things that I would’ve had a human do. The next thing I’m going to experiment with, though, is in un-redacting. Bruce Schneier, one of the leading security guys, said that the ChatGPT, as it’s thinking of the next word, said LLM might be able to be used for un-redacting. I want to test that out because I think that’s an important thing for us to know.
Mary Bennett
Thank you. John, you’re up, my friend.
John Brewer
I’ve used a whole bunch of different gen AI tools in kind of a toy capacity just to play with them. But the one that I use in my day-to-day work is actually from Microsoft. It’s called GitHub Copilot. Why Microsoft has two different tools named Copilot, I don’t understand, but I’m not a marketing person. There may be some insight there that I’m missing. But unsurprisingly, I write a lot of code in my life, and what this does is it sits in the background as I’m typing out computer code, and it says, “Okay, I think I see where you’re going with this,” and then gives me a prompt saying, “Hey, I think that this is what you’re trying to write. If it is, hit the space bar, and we’ll just insert this all for you, and you can move on with your day.” And I’ve been using it for about three months.
It’s fascinating to watch it getting better as it learns from processes, and it does things I disagree with. Sometimes it’s wrong, and sometimes it’s dramatically wrong, but it’s a fantastic model. I think of exactly the kind of human in the loop use of generative AI where it’s very quickly coming up with generative AI saying or coming up with generated content rather and prompting the user in an almost fluid way to say, “Is this where you’re going? Can I save you 30 seconds or two minutes of your time here? Push spacebar. If not, ignore me and go on with your life.” And that use case, I think, was very innovative, and it’s a great close cycle loop. I usually use it as an example when I’m asked for a good application of the technology.
Mary Bennett
Thank you, John.
John Brewer
Maura?
Maura Grossman
Me? Okay. I use it for the same purposes as Dave, in terms of content for slides is probably my biggest use, but this one is sort of funny. I am obsessed with AI as evidence, and I’ve been writing a lot with Judge Grimm, so I make a lot of deep fakes. I have taken Judge Grimm’s face and I’ve tried it out on lots of different tools, some of which are more convincing than others. I played with his voice. I played with my voice, and I’m trying to learn as much as I can about how good and compelling this stuff is [in terms of] how do you start to distinguish it real from fake? I’ve been playing with a lot of deep fakes of various types, both audio and video and images.
Mary Bennett
I love the word deep fake, just as an English major, but that’s neither here nor there. I want to go to Andrew’s question. Maura, I think this speaks to that article you wrote with Judge Grimm, but given the industry’s use of TAR and CAL for years and our general acceptance of ESI protocols, do you think there’s going to be a lag or barrier to using gen AI that judges or opposing parties will put up? [Is that valid with gen AI’s] results having been demonstrated to be effective with better recall and precision? What are your thoughts there on any apprehension from opposing counsel or judges in that regard?
Maura Grossman
It depends whether we have the evidence for it or not. Let’s rewind to the early 2000s when there was the Text REtrieval Conference at TREC, and for years they were doing benchmarking exercises that clearly showed that the TAR was better than keywords and manual review. And it was independent of any of us. It was a government organization and a huge effort. And by the time, Gordon and I wrote our 2011 paper, there was data out there, and you could show scientifically that this stuff was better or at least could be better, not automatically better in every case. So, right now, we don’t have that. We have a few ad hoc, non-reproducible, anecdotal sort of white papers, and that’s certainly not going to be enough I think for courts to say, “To rely on peer-reviewed pieces published in scientific journals and TREC to say, ‘This is permissible right now.'” I don’t think a court has anything to lean on.
I think we need to do that empirical work first, and once we have that, if the science is there, then it’s a no-brainer to use the technology. But if the science isn’t there and it’s just, well, it seems to me to give the right answers, but I haven’t compared it to anything else in a formal way that’s been peer-reviewed by other people and published, then I think we’ve got a problem.
Mary Bennett
Another part of Andrew’s question, which is something we’ve talked about, is the cost of using this. Do any of the panelists have any thoughts about using gen AI with continuous active learning and how to deploy both of those together to expedite review? Does anyone have any thoughts on using a combination of tools with gen AI?
Maura Grossman
I can say this, so as many people on this call know, Gordon and I developed CAL and introduced it in, I don’t know, 2013. I think we put out our first study, but we’d been using it before then. We’ve had a lot of graduate students at Waterloo who come marching in and saying, “Large language models, we’re going to beat your, you know what.” And we say, “Have at it. You can have access to our tool and here’s a track data set or a huge data set, and take a large language model and beat it.”
And it’s been three or four years, and at least for the CAL task, I’m not talking about other tasks like David was talking about or John, but for the pure separate this into relevant and non-relevant documents, no graduate student has been able to do it with a large language model yet. I don’t know about combining it because I’m not sure it will improve it. It may add other things to other aspects or parts of the process. Once you’ve divided your documents, maybe you want to separate them into issues, maybe it’ll be better and more efficient at finding all the ones on issues. But to me, these are all empirical questions. These aren’t matters of opinion, they’re matters of science.
Mary Bennett
Well, Ralph did post in the chat, so I’m going to read directly from what Ralph shared that he agrees wholeheartedly with you on empirical study, but counters, isn’t this industry such a moving target and so diverse that getting reliable foundation for court approval will in fact be difficult?
Dave Cohen
Let me jump in for a second just to say that I think that to a large extent, TAR and the case law out of TAR have paved the way a little bit for acceptance of new technology. I mean, TAR has been around since the early 2000s, and it took more than a decade to get acceptance, and it still hasn’t replaced most human review, notwithstanding the empirical studies that Maura has referred to. What we have come out with and have gotten commonly accepted now is ways of validating results. We have statistics like recall and precision, and we know how to sample. I’m not sure we need a TREC. I mean, TREC project was going on for years, and I had problems with it to be honest, because I never thought that the human review they were comparing it to was a very good quality human review.
It certainly wasn’t the kind of human review we do with our group. And when we compare our results to TAR, we do better than TAR. I don’t trust a lot of those statistics, but what we do have now are ways of validating. I don’t think you need to have two or three years of scientific papers if I can do a project tomorrow and run it, whether it be TAR or gen AI on it, and then look at the results. You take a statistically significant sample, and did it get it right, and did it beat human review? That’s all you need. One good thing that came out of TAR was we did get these opinions, and if you look at Judge Peck’s opinions in Da Silva Moore and Rio Tinto, they don’t just talk about TAR, they talk about technology assisting review.
They don’t just talk about predictive coding. I know TAR’s been coined as a simile for predictive coding, but he’s talking about all kinds of technology, and that’s now been accepted. Anything we can validate that shows, “Hey, this is just as good or better than human review,” that should be accepted pretty quickly now. We can do that on a case-by-case basis just through validation testing. It doesn’t have to be all that complicated or expensive. I think the way has been paved. I agree with Maura gen AI still has to prove that it can do it as well or better. Maybe we don’t need it so much for relevance review because TAR’s gotten us most of the way there. Maybe it can accelerate training the TAR, and obviously, with continuous active learning, the training goes on. But for example, I see more potential for the generative AI to improve privilege calls, which traditional TAR has always had a problem with. So I think there are other applications where it’s going to help a lot, but maybe not review getting better than TAR any time very quickly.
John Brewer
I think that that’s a good point. There is a difference, at least in the model that I see most often for privilege, which is when we use gen AI to train the training sets for TAR and CAL and then use the CAL model to evaluate the rest of the data sets. With human review, we usually have a lot of people doing that review, and it helps even out any implicit biases in the data. One of the interesting things that we’re seeing in LLMs, especially very large LLMs, is very subtle biases in the data, especially when it comes to things like proper nouns and proper nouns that are also words in English where you can end up with a couple of, well, like I said, biases in your data where that could end up showing up in TAR in a way that traditional validation wouldn’t catch.
Whether or not it’s a problem is something that’s still kind of a point of active research, and Maura, you might be able to speak a little more coherently to the state of research in this field than I can, but it’s something that we’re concerned about when we have exactly one filter effectively as opposed to a group of humans all with their own biases and error rates and thoughts doing document review. We lose that diversity benefit to our statistical analysis. And I think it’s something that gets left by the wayside when we take a strict garbage-in, garbage-out compartmentalized approach to TAR in terms of replacing L1 or even L2 review with gen AI.
Maura Grossman
Well, you’re right about that, John. The recent research is very interesting. I remember the early TAR opinions that said you need a senior partner to do the review, otherwise the tool will not be trained properly. Well, the research we did, and that other people did, shows it’s exactly the opposite, that the senior partners had extremely narrow views of what was relevant and having more noise with a diverse team of reviewers got a lot better recall than your super narrow partner did. I think that it is true that having a little bit of noise and diversity improves the algorithm. I mean, you don’t want to put a lot of garbage in, then, of course, garbage in, garbage out. But there’s no such thing as perfect because even Dave and I sitting at a computer wouldn’t agree on 30% of the documents.
John Brewer
I won’t name names, but I know from experience that the same person looking at the same document on two different days will frequently classify it differently. There is that.
Mary Bennett
In the interest of time, we have five minutes. I feel like this has flown so quickly. If anyone has any questions or thoughts, please put them in the chat. Otherwise, we’ve talked about security risks and questions you should ask. Let’s talk a little bit about other limitations or things that folks should be aware of in terms of how practitioners can ensure that they’re avoiding issues like copyright infringement, breach of contract, or violation of privacy rights. That’s a very big question. John, why don’t you kick us off and take that as you will?
John Brewer
That’s another hour of conversation right there. I will say that I think Google Gemini, which I think Mary Mack mentioned earlier, if I recall correctly, one of their selling points is that they have provenance for their entire training set. I guess that is something you can only do if you’re the size of Google, and they will indemnify their users against that sort of copyright risk, which is an approach to solving that problem. It’s kind of the brute force like, “Okay, fine. Well, if we have to validate the whole data set, we’ll validate the whole data set.” I will be interested to see what the consensus of rulings comes down to in terms of gen AI creating or generating copyrighted material because I think we’re going to start digging down into some interesting legal philosophical questions that I’m not qualified to [give an opinion on] before we get a cultural standard for that.
Mary Bennett
Great. Kaylee, any thoughts on this?
Kaylee Walstad
I’m on mute, now I’m already starting to talk. I agree. I think there’s not enough information currently. It’s, I think, a scary time of what has already been taken or infringed upon, and I think we’re going to see a lot more coming out about that in terms of lawsuits and all kinds of different actions. Right now, I think there’s a lot we still don’t know.
Mary Bennett
Anything to add, Dave, Mary Mack, or Maura?
Maura Grossman
I’ll add a couple of things. One, if you’re worried about sensitive information, don’t enter it in the first place, right? Read the terms of service. Know whether the provider is training on your information or selling your information. If it doesn’t say, assume they are. And frankly, somebody put this in the chat earlier, how do you know they aren’t, even if they say they aren’t? And that’s a good question because Facebook and others have promised lots of things that didn’t turn out to be true. Trust but verify. As Kaylee said, don’t ask for exact dupes. Don’t say, “Give me the first paragraph of an article in The New York Times from last week.” Don’t ask for reproduction. It’s okay to ask for a reproduction of Shakespeare. He’s dead. It’s been more than 70 or 80 years, but don’t ask for last week’s author. Those kinds of things are common sense. But how many of us read the terms of service? We don’t. We’re back in the early days of cloud and we got to read those contracts, and see what they say and make sure we’re in compliance.
Dave Cohen
From a legal perspective, you have to worry about more than just whether they’re training on or selling your data because we have a duty to protect the confidentiality of data. And I know I’ve talked to some people who said, “Okay, I checked off the part on ChatGPT that I don’t want them to use my training data. I’m safe.” You’re not safe because the terms and conditions of the free version of ChatGPT that says you’ve now given access to them. Even if they’re not going to sell it, even if they’re not going to train on it, they now have access to your data. What does that do to privilege? What does that do to your duty to protect the client’s secrets? As has been pointed out before, even if they do guarantee all of that, a company’s only as good as its security protections, and everybody’s getting data breached these days. You need to do all that due diligence on any data before putting any confidential or sensitive data anywhere, you need to do the due diligence to make sure it’s going to be safe, and appropriate to the sensitivity of the data.
Mary Bennett
Well, it’s been an hour. There is a lot more we could dive into, but I want to give an extreme thank you to John, Dave, Maura, Kaylee, and Mary for your insights in this conversation as well as to everybody who attended. This has been a fantastic conversation. We will be sending out a copy of this recording after the events if you registered with your email, so just be on the lookout for that. We are excited to continue our workshops. Please stay tuned for our Q2 date, and I think I speak for everyone. Connect with me on LinkedIn, Kaylee, and Mary, we’d love to continue the conversation as well.
Kaylee Walstad
And Mary, we have a huge shout-out to EDRM’s trusted partner, HaystackID, for sponsoring and supporting our Q1 first workshop, making Mary Bennett available to organize and moderate. You did a terrific job, as always, Mary. A special shout out to Rob Robinson, who has supported EDRM and also is an early adopter of generative AI and has helped Mary and me in the background since November 2022 and pushed us along. And Ralph, another AI expert who has actually gifted me. I don’t know if he gifted Mary, but on ChatGPT, you could create your applications, and he created a visual muse that I have tipped him multiple times a day. It creates great images. Find somebody to mentor you if you don’t know. If you haven’t used it, ask questions. Find somebody who is using it because it’s made a huge difference to Mary and me. Thank you so much.
Mary Bennett
Thank you everyone. Enjoy your day.
Kaylee Walstad
Have a great day.
Expert Panelists’ Bios
+ Mary Bennett
Senior Director, Content and Community Initiative, EDRM
Director, Content Marketing, HaystackID
Mary Bennett, HaystackID’s Director of Content Marketing, focuses on the power of storytelling to educate the legal technology industry on pressing issues impacting practitioners. With nearly 10 years of content marketing experience, Bennett joined HaystackID after working at an agency to help B2B tech startups grow their marketing engines through content that drove audiences through the marketing funnel.
Before her agency experience, Bennett worked at Chicago-based Relativity as a Senior Producer on the Brand Programs team. She was a founding member, host, and producer of Relativity’s Stellar Women program and producer of the company’s documentary series, On the Merits. In her role, Bennett crafted and socialized important stories that elevated the eDiscovery community and illustrated technology’s potential to make a substantial impact.
+ John Brewer
Chief Artificial Intelligence Officer and Chief Data Scientist, HaystackID
John Brewer has worked with HaystackID since 2015 and now serves as the Chief Artificial Intelligence Officer and the Chief Data Scientist. In his role as CAIO, Brewer guides the company’s integration of artificial intelligence and generative AI within the legal technology sector, capitalizing on his remarkable two decades of AI experience.
He has been pivotal in the adoption of large-scale technology-assisted review, developing the suite of AI-based machine learning tools that power HaystackID’s review process, driving loss prevention, and ensuring unparalleled efficiency – most notably, Protect Analytics, the company’s exclusive set of technologies and processes that allow client data set analysis for sensitive information including personally identifiable information and protected health information, as well as data breach code anomalies. Brewer’s approach avoids opportunistic trends, centering instead on thoroughly researched AI solutions that are in line with the client’s real needs and ethical standards. Brewer’s deep understanding of decades of AI capabilities distinguishes him as an exceptional leader and innovator.
+ David R. Cohen
Partner, Reed Smith, LLP
David Cohen is the chair of Reed Smith’s Records & E-Discovery (RED) Group and a member of the Emerging Technologies group. A Harvard Law graduate with more than 35 years of commercial litigation experience, Cohen serves as eDiscovery counsel and information governance counsel to some of the top companies in the world. He also represents clients in complex litigation matters and counsels companies of all sizes on information governance and litigation readiness issues.
Cohen has been recognized individually by Chambers Global, Chambers USA, Super Lawyers, Best Lawyers, and Who’s Who Legal as a top eDiscovery lawyer and litigator. He has also received a Law 2.0 “Outstanding Leadership” award in 2022 and the Legal Intelligencer Pennsylvania “Innovator of the Year” award in 2023. In addition to individual recognition, the 70+ lawyer RED Group that he leads has been recognized by Chambers and Legal 500 as a leading eDiscovery practice.
+ Maura R. Grossman
Principal, Maura Grossman Law
Research Professor, School of Computer Science, University of Waterloo
Maura Grossman is a Research Professor in the David R. Cheriton School of Computer Science at the University of Waterloo, an Adjunct Professor at Osgoode Hall Law School, and an affiliate faculty member at the Vector Institute of Artificial Intelligence, all in Ontario, Canada, as well as an eDiscovery attorney and consultant in Buffalo, New York. Previously, Grossman was of counsel at Wachtell, Lipton, Rosen & Katz, where for 17 years, she represented Fortune 100 companies and major financial services institutions in civil actions and white-collar criminal and regulatory investigations and advised the firm’s lawyers and clients on legal, technical, and strategic issues involving eDiscovery and information governance, both domestically and abroad.
Grossman is a well-known and influential eDiscovery lawyer. She is described in Who’s Who Litigation 2015 E-Discovery Analysis as “‘sensational’ according to her peers and . . . a ‘go-to’ in the area.” Chambers & Partners USA 2015 Litigation: E-Discovery described her as “the best-known person in the area of technology-assisted review, a superstar among superstars.”
Grossman’s scholarly work on TAR, most notably, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, published in the Richmond Journal of Law and Technology in 2011, has been widely cited in case law, both in the U.S. and elsewhere.
+ Mary Mack
CEO, Chief Legal Technologist, EDRM
Mary Mack leads the EDRM, a project-based organization, and is the former Executive Director of a certification organization. Mack is known for her skills in relationship and community building as well as for the depth of her eDiscovery knowledge. Frequently sought out by media for comment on industry issues, and by conference organizers to participate, moderate a panel, lead a workshop or deliver a keynote. Mack is the author of A Process of Illumination: The Practical Guide to Electronic Discovery, considered by many to be the first popular book on eDiscovery.
She is the co-editor of the Thomson Reuters West Treatise: eDiscovery for Corporate Counsel. Mack was also recently honored to be included in the book; 100 Fascinating Females Fighting Cyber Crime published by Cyber Ventures in May 2019. Mack has been certified in data forensics and telephony. Mack’s security certifications include the CISSP (Certified Information Systems Security Professional) and the CIAM, Certified Identity and Access Manager.
+ Kaylee Walstad
Chief Strategy Officer, EDRM
Kaylee Walstad leads the global project-based organization and is the former VP of client engagement at a certification organization. Walstad is known for her role in building communities, uniting people and companies across the globe, and brand amplification for partners through social media and events. She is a frequent public speaker on a variety of topics from personal development to the nuances of eDiscovery. She has a broad background in eDiscovery and skills that uniquely position her to provide insight into the challenges faced by the end-users of eDiscovery services and technology and the organizations serving them.
Walstad has extensive expertise in developing cross-organizational discovery strategies for large litigation and investigations and is a Certified Identity Management Professional (CIMP).
About EDRM
Empowering the global leaders of e-discovery, the Electronic Discovery Reference Model (EDRM) creates practical global resources to improve e-discovery, privacy, security, and information governance. Since 2005, EDRM has delivered leadership, standards, tools, guides, and test datasets to strengthen best practices throughout the world. EDRM has an international presence in 145 countries, spanning 6 continents. EDRM provides an innovative support infrastructure for individuals, law firms, corporations, and government organizations seeking to improve the practice and provision of data and legal discovery with 19 active projects.
About HaystackID®
HaystackID solves complex data challenges related to legal, compliance, regulatory, and cyber events. Core offerings include Global Advisory, Data Discovery Intelligence, HaystackID Core® Platform, and AI-enhanced Global Managed Review powered by its proprietary platform, ReviewRight®. Repeatedly recognized as one of the world’s most trusted legal industry providers by prestigious publishers such as Chambers, Gartner, IDC, and Legaltech News, HaystackID implements innovative cyber discovery, enterprise solutions, and legal and compliance offerings to leading companies and legal practices around the world. HaystackID offers highly curated and customized offerings while prioritizing security, privacy, and integrity. For more information about how HaystackID can help solve unique legal enterprise needs, please visit HaystackID.com.
*Assisted by GAI and LLM technologies.
Source: HaystackID
*Reed and Smith
**Maura Grossman Law
***EDRM
***EDRM