Editor’s Note: Effective data management is more critical than ever, and the latest HaystackID® webcast, “Data Minimization: Why Less is More,” offered expert insights on why organizations must take a strategic approach to data retention and disposal. As privacy laws change and regulatory scrutiny increases, businesses can no longer afford to take a passive stance. Experts Christopher Wall, Esther Birnbaum, Tara Emory, and Peter S. Hyun shared practical strategies for reducing legal risk, improving compliance, and cutting unnecessary costs through thoughtful data minimization. The discussion explored the fragmented nature of U.S. privacy laws, the growing role of the FTC in enforcement, and the financial impact of poor data hygiene. The panelists spoke about a real-world compliance failure, highlighting the risks of retaining excess data and reinforcing the importance of clear retention policies, regular audits, and AI-driven automation. Organizations adopting Generative AI (GenAI) to enhance their data governance strategies must carefully implement these technologies to balance risk and compliance. Read the transcript to learn how to embrace proactive data minimization to strengthen security, streamline operations, and adapt to shifting regulatory demands.


Expert Panelists

+ Esther Birnbaum
EVP, Legal Data Intelligence, HaystackID

+ Tara Emory
Special Counsel, Covington & Burling LLP

+ Peter S. Hyun
Law, Policy, and Investigations Expert, Formerly with the FCC | DOT | DOJ | NY AG | U.S. Senate

+ Christopher Wall (Moderator)
DPO and Special Counsel for Global Privacy and Forensics, HaystackID


[Webcast] Data Minimization: Why Less is More

By HaystackID Staff

The recent HaystackID® webcast, “Data Minimization: Why Less is More,” brought together a panel of industry experts to discuss why reducing unnecessary data isn’t just a regulatory requirement—it’s a strategic advantage. Moderated by Christopher Wall, HaystackID’s DPO and Special Counsel for Global Privacy and Forensics, the session featured Esther Birnbaum, Tara Emory, and Peter S. Hyun, who explored how organizations can effectively manage data retention and disposal to mitigate risk, comply with evolving privacy laws, and reduce legal and regulatory exposure. One of the biggest hurdles in data minimization is the lack of a unified federal privacy standard in the U.S., unlike the EU’s GDPR. Instead, businesses must navigate a patchwork of regulations—HIPAA for healthcare, the Gramm-Leach-Bliley Act for financial institutions, and a growing number of state-level privacy laws. Adding to the complexity, the Federal Trade Commission (FTC) has ramped up enforcement against companies that fail to implement strong data minimization practices, particularly when handling sensitive health and geolocation data. According to Peter S. Hyun, organizations should assume that a data breach or compliance challenge is not a matter of if but when. Companies expose themselves to unnecessary risk and regulatory scrutiny without clear retention and disposal strategies.

The panelists then went beyond compliance, sharing how data minimization drives efficiency and cuts costs. Expert panelist Tara Emory highlighted that a strong information governance strategy is key—organizations need to understand their data, map legal requirements, and proactively address risk. Esther Birnbaum spoke about the financial impact, noting, “The more data you have, the harder it is to stay in compliance.” Excess data doesn’t just increase legal complexity—it also escalates security risks, inflates storage costs, and complicates eDiscovery processes. The discussion also featured a real-world cautionary tale: A telecom relay service provider mistakenly retained call content data for years despite regulations requiring immediate deletion after each call. The result? A compliance nightmare—a reminder that without clear data retention policies, regular audits, and company-wide awareness, organizations can quickly find themselves in legal hot water.

The experts then discussed the role of emerging technologies, particularly Generative AI (GenAI), in identifying, classifying, and managing sensitive data. However, the panel cautioned that AI-driven data minimization must be carefully implemented—organizations must ensure they don’t inadvertently delete data required for legal holds, contracts, or essential business operations. The key takeaway? Data minimization isn’t just about deleting data—it’s about keeping only what’s necessary. Organizations can improve security, streamline operations, and stay ahead of regulatory changes by taking a proactive, strategic approach.

Watch the recording and read the transcript below to learn more about strategies for effective data minimization.


Transcript

Moderator

Hello everyone, and welcome to today’s webinar. We have a great session lined up for you today. Before we get started, there are just a few general housekeeping points to cover. First and foremost, please use the online question tool to post any questions you have, and we will share them with our speakers. Second, if you experience any technical difficulties today, please use the same question tool, and a member of our admin team will be on hand to support you. And finally, just to note, this session is being recorded, and we’ll be sharing a copy of the recording with you via email in the coming days. So, without further ado, I’d like to hand it over to our speakers to get us started.

Chris Wall

Thank you, Mouna. Hello everyone. And welcome. My name is Chris Wall, and I’m a DPO and Special Counsel for Global Privacy and Forensics at HaystackID. On behalf of the entire team at HaystackID, I’d like to thank all of you for attending today’s presentation and our discussion titled “Data Minimization Less is More.” Today, we’re going to talk about what data minimization means, why it’s a thing, why it’s a good thing, and some of the considerations for minimizing your use of personal information. So, I’m going to be our guide for today’s discussion. It’s part of HaystackID’s ongoing educational series. These webcasts, of which today is one, are really designed to provide helpful insights and to help you stay ahead of the curve in achieving your cybersecurity, information governance, privacy, and eDiscovery objectives. Today’s webcast is being recorded for future on-demand viewing. After today’s live presentation, we’re going to make the recording and a complete presentation transcript available on the HaystackID website. Joining me today are three awesome, awesome panelists. They’re leaders in privacy information governance and AI fields. We have Tara Emory, Peter Hyun, and Esther Birnbaum. Our goal today is to arm you with actionable insights into streamlining data practices and engaging in good data and privacy hygiene while strengthening compliance and operational efficiency. And I’ll add now what’s really become a standard disclosure by saying that our panelists today are speaking on their own behalf, and their comments or interviews they express may or may not reflect the views or positions of their respective employers or the organizations that send them paychecks. So with that, let’s just take a moment and let each of you introduce yourselves. Let’s start with Tara. Go ahead.

Tara Emory

Okay. Thanks, Chris, for having me here today. Today, I am excited to talk about data minimization, and it’s important for companies to understand how to run a program designed around the concept of data minimization. I am the Special Counsel at Covington & Burling in the eDiscovery, AI, and Information Governance Group. So, I advise clients on all three areas, including information governance, which contains data minimization, and AI governance, which is becoming increasingly important for companies to get their arms around. My prior experiences from this included acting as a consultant on these types of issues, a testifying expert in court, and general counsel. Before that, I was an antitrust attorney.

Chris Wall

Thanks, Tara. Good to have you on today. Peter.

Peter S. Hyun

Yes. Hi. Thank you, Chris, and thank you, everyone, for having me. Thank you to HaystackID and my co-panelists here. Very happy to be here. I just recently left my post over at the Federal Communications Commission, where I was serving most recently as the Acting Chief of the Enforcement Bureau, which is the largest bureau or office within the FCC, where we at the FCC regulate everything in the communications sector. So that ranges from everything from subsea fiber optic cables to satellites and, of course, broadcasts. Anything that’s broadcast over the waves, including what folks may be familiar with, Super Bowl halftime shows, and whatnot. But prior to joining the FCC, I was at the transportation department, where I worked on congressional investigations and crisis management, and before that, I was in various senior roles at the Justice Department, including at the legislative affairs office as well as the Associate Attorney General’s office that oversaw all of the 13 core DOJ litigating components. Before that, I was in private practice, and I also spent some time on Capitol Hill working on the Senate Judiciary Committee on cybercrime, the False Claims Act, government contracting issues, and whatnot. I am so very appreciative to be here. Thanks so much.

Chris Wall

Thank you, Peter. Esther.

Esther Birnbaum

Hey everybody. I am Executive Vice President of Legal Data Intelligence at HaystackID. I recently joined HaystackID. I came over from being an Associate General Counsel at Interactive Brokers, where I headed up the eDiscovery department and any other area of legal advice related to technology, which was plentiful at a fintech company. Prior to that, I had many years of law firm experience. I came over to HaystackID to focus on building out our legal data intelligence department and really integrating generative AI (GenAI) into not just our eDiscovery workflows but across business and compliance areas in general.

Chris Wall

Well great. Thank you, Esther. And it’s good to have you at HaystackID. Finally, as I mentioned, my name’s Chris Wall. I’m DPO and In-House Counsel and Chair of our Privacy Practice at HaystackID, an eDiscovery, privacy data security, and forensic investigations firm. My job at HaystackID is to guide HaystackID and our clients through the privacy and data protection thicket of cyber investigations, information governance, and traditional discovery. But my job immediately is to help guide this discussion with the four of us and these three amazing panelists. And we’re going to be talking about minimizing the risk associated with PII by minimizing that PII in the first instance. But this webinar is designed to help you make the best use of the 55 minutes to an hour you will spend with us. So, we welcome your input. So we’re a big crowd. I think it looks like we’re around 200 registrants, but we want to make this next hour or so as practically beneficial as possible to you. So we’ll watch the chat box here, and if you have a question, please drop it into the Q&A feature, and we’ll try to address the questions as we go. So, with that, let’s dive in. And with any good discussion. Let’s start with some definitions. Let’s start with a foundational exercise by talking about what this data minimization is—and recognizing that we have folks listening today from all kinds of backgrounds. It’s just as true for our panelists, who come from all different backgrounds. And despite all of our panels here being lawyers, they all come at data minimization from different perspectives. So, I’m interested to hear how each of you, our panelists, define this term or this concept of data minimization. And let’s start with Tara. How would you define or describe data minimization?

Tara Emory

Data minimization is trying to control how much data your organization is retaining. So we have retention and we have disposition. And ideally, those things are done intentionally. So you intentionally retain certain things and you intentionally then also dispose of information that you don’t need. It obviously gets a lot more complex than that based on different needs, regulatory requirements, litigation profile, litigation hold obligations, and the nature of your business. But data minimization is the idea that we are going to dispose of data we do not need.

Chris Wall

I like the sound of disposition. All right. Peter, is there anything you’d like to add there? How would you like to define data minimization?

Peter S. Hyun

No. I liked Tara’s definition perfectly. I think what I was just going to mention, just to give another dimension to this, was, as I mentioned earlier, I come from an enforcement perspective, and having served at the FCC, where we regulated the communication sector, I think one of the increasing risks here, and particularly when it comes to data itself and the types of data that is now so commonplace with hyperscalers and whatnot, is really what you are seeing within the government and throughout, including when it comes to sensitive data and how, for example, the government tends to look at this. And I mentioned this just as you all develop your own risk profiles, et cetera, and action plans. I just wanted to know three recent legal-type and policy-type prescriptions or data types that demonstrate this. One was a TikTok brief that went all the way up to the Supreme Court, where the government had highlighted as one of the national security risks itself being the vast swaths of user data that was, in fact, accessible to TikTok and, by extension, the people’s Republic of China. The second thing was, of course, a new recent executive order that the Trump-Vance administration issued on basically foreign investment and the risk of data being accessed by possible foreign adversaries. Then, there’s the bulk data EO [Executive Order]. That was another one that, again, increasingly, you are seeing this concern about sensitive personal data. And so this is a very, very appropriate topic. I just wanted to add that.

Chris Wall

Thanks, Peter. We’re most concerned about those large aggregations of data, and we’ll talk about the risk of having those large aggregations, especially of personal information sitting around. Esther, what would you like to add here?

Esther Birnbaum

Yeah. I think both of the other panelists here covered it pretty well, but I’m going to put on my corporate hat for a minute and say from a corporate perspective, anything related to data minimization, really the question is what do we have to do so we don’t get fined or sued? So, my perspective is always going to be that of a realist. Many companies existed before the GDPR, the CCPA, and other privacy regulations and laws were in place. Data minimization goes back to when we had rules about keeping, storing, and hosting data. So, from a company perspective, it’s always going to be what the regulations are and what we have to do to dispose or not dispose of data so we don’t get fined or have to pay money.

Chris Wall

Thanks, Esther. And look, I’ll be candid: the irony is not lost on me on the fact that we’ve got four lawyers on this panel discussion here. And lawyers are the worst, in my humble experience, when dealing with data minimization. We are the worst pack rats. Okay. Hey, I just drafted this marvelous brief. It’s a work of art, and I want to keep it forever. I’m going to frame it; I’m going to put it on my wall. We are the worst about keeping data around, so I recognize that. But look, I think if we take that and translate it to all other areas of the business world, we would like to keep stuff around because, who knows, we might need it in the future. And I think our discussion today is about how that’s unnecessary. Hopefully, not necessarily. So it used to be that cybersecurity and information governance, including document retention and disposition, were things we knew we always needed to do. They were like those fruits and vegetables that we always knew that we should be eating every day. We knew 20 years ago that we needed to practice healthy data hygiene. We knew we shouldn’t be keeping redundant, obsolete, and trivial data ROT, but it was a hassle and a low priority. Along with our midlife data doctor checkup, also known as the GDPRs. Now what? [There are] 25 US state privacy laws, at least here in 2025. And during that checkup, your chief privacy officer or your DPO tells you, “Hey, your privacy’s out of whack.” So when we look at ourselves in the mirror in 2025, it becomes abundantly clear just how important it was for us to have been eating all that leafy cybersecurity and fiber-rich info gov for the last 20 years. And privacy, I guess, is the catalyst. It’s the cholesterol of the information governance world. And now that cholesterol or our data hygiene is called into question because of these privacy regulations, we need to take a hard look in the mirror and examine our data hygiene. So, let’s talk about some of those legal requirements to make us a little bit more data hygienic, I guess. Peter, do you want to lead off with some of the drivers here that are now making us look at this data minimization principle and do what we probably should have been doing for the last 20 years?

Peter S. Hyun

Maybe where I will come from is just talking about legal standards, particularly when it comes to the federal level. Most people here, which is a very fluent audience, I believe, know that there’s no comprehensive federal standard and similar to how there’s no, for example, comprehensive federal AI standards. The reason why I lump those two together is that there have been competing pieces of legislation here in the United States over multiple congresses where major disputes had centered around what that federal standard would look like, including probably the top three tension points, which were whether there would be, for example, private rights of actions that were conferred to private citizens for various privacy-related claims. Questions about federal preemption. These questions go to the point that Chris had made earlier about the various patchwork of state laws that have been enacted and that folks here are continuing to look at. And then also civil rights protections. And as a broader overview, I don’t see federal privacy legislation getting enacted in the next Congress, but again, it’s very hard to tell. I will say that last Congress, there was a bipartisan and bicameral approach. Still, now that Congress has changed hands, including the various committees of jurisdiction, it’s unclear whether it will be bipartisan and bicameral again. Having said that, at the federal level, there’s a patchwork of federal statutes that provide for sector-specific protections. I mentioned, for example, that I had been at the FCC, but of course, in the financial sector, there’s a Gramm-Leach-Bliley Act; for healthcare, there’s HIPAA; and in telecom, there was CP&I, which was a special protected status of information known as customer proprietary network information. There are also federal statutes that protect the privacy of cable operator subscribers as well as satellite subscribers. That sector-specific approach is also seen as a different dimension regarding the FTC. The FTC has a general consumer protection statute and authorities prohibiting unfair and deceptive practices. It also has more limited authority when it comes to children’s privacy. But really, I think when it comes to data and data minimization, one of the things that the FTC that I think you’ve seen, and you probably will continue to see this in this administration, is beginning in 2022, there’ve been over a dozen of actions by the FTC that brought cases that included data minimization requirements, whether they had to do with sensitive health data for advertising or geolocation data, which is very much top of mind. So, that is the overview on the federal side.

Chris Wall

Sorry. I was going to say, Peter, we see that from the FTC because they have a special charge for prosecuting those cases, right?

Peter S. Hyun

Correct. Correct.

Chris Wall

For privacy enforcement in the US, anyway. That’s a great background about whether and if we’ll ever see a comprehensive US privacy act or privacy law similar to the EU or the UK GDPR. Like you said, we have a bunch of state laws, a patchwork of state laws. But I think California’s been the leader there, and I think that’s probably the biggest hang-up we have right now, frankly, is California. Before we can address a comprehensive US federal privacy law, we’ve got to address how it will interact with California. Tara?

Tara Emory

This is really about controlling data within an organization. Some of these statutes, such as HIPAA, require the retention of certain information. Patients have a right to have their doctors keep their records for an expected amount of time, depending on the state, and then be able to get access to them. A lot of domestic privacy and international don’t require you to keep private data, but you need it in some cases for the ordinary course of business. And then, if you have, it provides rights to the people that the data is about. And so it may be in your interest to get rid of it so you don’t have to provide them all the information you have if they come asking for it, for example. What you need to do is look at what you are required to keep and what you may not be allowed to keep past a certain timeframe. And then, in the middle, what can become burdensome for you that you do not need? And what is a value? And then be ready. You have to know what data is kept where in your system so that you’re prepared to respond with whatever your obligations are under these types of regulations. I think everyone’s familiar with GDPR in the EU, and there are a host of similar types of regulations around the world, as well as all the state ones that Peter just went through. However, there are other considerations that can create more risk and more cost for an organization. Suppose they were to run into someone like Peter and his former role getting investigated. In that case, having extra data that didn’t need to be around is now subject to legal hold, can’t get rid of it, may end up having to put it through the entire discovery process, and then it hangs around until that’s all dealt with and creates a lot of additional costs. It’s the same thing, even if we’re not talking about private data; if there is a security breach, sensitive information from the company may get breached and create additional problems that could be of all types. That could be confidential information getting out; it could be they don’t even know what got out, and it could have been private, but it could not. All of these come back to an organization that needs to know its data and legal requirements, look where its other risks arise, and deal with all of its data in a responsible and operationalized way. You don’t want to just do a little cleanup here and there. You want to have an entire program built around managing your data.

Chris Wall

Yeah. Thanks, Tara. Esther, anything you want to add here?

Esther Birnbaum

I am going to throw a little wrench in and say data disposition laws are really important and different across jurisdictions, sectors, etc. But when you’re a global company, you often find that data retention regulations conflict with data disposition regulations across different jurisdictions. If you’re at a company that has a required retention because of your industry, and one jurisdiction requires a 10-year or 12, 15-year retention period. Another jurisdiction requires you to dispose of inactive client data after six years; you’re in conflict, and there’s no guidance on which rule to win out. So, it’s not as simple as falling into one regulation when you’re a global company.

Chris Wall

We will talk about some of those specific issues here, hopefully, as we go along, especially when we talk about some case studies here. I wanted to point out one more item here. We’ve got that nice circle that talks about specific retention issues. And we do have specific disposition issues, I guess, that arise, such as when a customer or a data subject requests to have their data deleted. You can do that with just about any organization of size these days, where you can make a request for them to access your data to know what they’ve got on you and, importantly, here to ask them to delete your data. That’s what’s often been referred to as the right of erasure or the right to be forgotten. When we’re recycling, and every organization deals with getting rid of your hardware, updating your hardware, with those, before you return an item to circulation or you get rid of it or recycle it, you want to make sure that it’s been cleansed and that you’ve gotten rid of that data. That seems obvious. But one interesting one came up for us at HaystackID just in the last couple of weeks: we were just faced with an Italian data protection authority. There are restrictions on the retention of employee email data. Limiting retention to just a few days, not months, not weeks. And you have to get rid of it within a number of days without justification for any longer storage. So, if you provide some justification, you can keep it longer. I note that because retention and minimization issues like this Italian one pop up in individual jurisdictions. It’s a really good idea to make sure that you can consult with your privacy officer or privacy counsel, if necessary, about specific minimization requirements in your jurisdiction or within your specific industry. I should mention one more. Because we also have up there, I mentioned some frameworks because it’s not just laws and regulations. There are information governance frameworks and bodies like ARMA. ARMA’s information governance implementation model. And GARP, of course, provides standards for data minimization. We can look to those standards that will apply across industries, business sectors, and jurisdictions. Is there anything else we want to discuss here about some of these drivers for minimization, guys?

Tara Emory

I just think the other one that has become even more common than before is that when companies engage in contracts with other businesses, sometimes there are additional protections or requirements like won’t keep anything about this beyond 12 months past the end of whatever the relationship is or that you will protect it if you get subpoenaed and give them rights of objection. And so all of these are just further things that relate to the need to be organized and ultimately minimize data or follow whatever you’re bound to.

Chris Wall

Great point. Thank you, Tara. Let’s not forget about the contractual obligations there. Let’s move on to talk about the benefits. It’s not like we have to sell anybody on the real benefits of data minimization, but let’s talk about what they are. And I should mention it here, too, because this question popped up. We will make this slide deck available and the transcript that involves any of the case studies we discuss here. That’ll be in the transcript. I hope that answers your questions there. So, we discussed data minimization and why we have to do it. We talked about the laws, the regulations, the frameworks. Even though we should have been doing that for the last 20 years, we’ve discussed why we should be doing it now. So, let’s talk about some of the real benefits of data minimization. And let’s take each one of these in turn. So we’ve got five listed up here. And there might be some more, but let’s flesh out these five. Let’s talk about number one, and that’s the goal of data minimization, which is to reduce the amount of unnecessarily kept personal information stored by organizations, which, among other things, decreases the potential risk of data breaches and unauthorized access. But let me ask Tara, Peter, and Esther for your perspective on the real goal here: data security and protecting privacy.

Tara Emory

I touched on this earlier, but the idea is that if you are going to be breached, the less, the better. You mentioned internal access. And so this has always been an issue. You can have unauthorized access within your own organization, depending on what your organization does. That itself may even breach requirements that you have in contracts with other companies that you may be providing services to, such that certain things will only be accessed by one party or another. It is worth raising the question of how GenAI, especially the increasing use of AI in organizations, adds to this. Because before, we were thinking more about someone stumbling into some data source that they shouldn’t have really been in the first place. And unless they’re doing it deliberately, it’s not something that will happen often. And lock down all of your security. It’s becoming more important now that you may have a chatbot gathering all kinds of information from around the company and just telling any employee who didn’t even know they were asking for it about what’s in those documents. Having your security locked down, ensuring you know where and what it’s subject to, and not having more than you need will all play into this new dynamic.

Chris Wall

I love that, Tara. If it’s not there, it can’t be breached. Or Esther, I’m going to go to you. AI, I know that you’re bailiwick. You want to weigh in. I saw you perk up as soon as she said AI.

Esther Birnbaum

Before I go into AI, my background is in discovery. If data that’s not required to be retained is there, it’s discoverable. The question is whether or not it is beneficial. When it comes to data that you retain for business uses, you’re going to ask whether the business use case for it outweighs keeping the data or disposing of it. But once it’s there, it’s discoverable. The more data you have that’s discoverable, the more expensive discovery is going to be.

Chris Wall

All right, Peter. Is there anything else you want to add here?

Peter S. Hyun

The general expectation often is not if you will suffer a breach or some cyber incident or something like that, but when. When evaluating an institution, a company, or an organization, the question will arise as to what thoughtfulness or approach at the front end when it comes to data collection and retention and protection, whether there was some adherence to that. For example, I mentioned the CP&I statute. When we were at the FCC, there was also a broad authority when it came to common carriers of a broad prohibition against unjust and unreasonable practices by common carriers. And so when evaluating those kinds of questions, particularly post-breach or when, for example, there is an incident like that is under some scrutiny, there will be questions about how the company’s positioning or the institution’s positioning before that happened and was it considered and in the scope of what the precedent has developed under what are unjust and unreasonable practices. So, having a very thought-out approach at the front end has some weight to it, especially when discussing sensitive information at the top.

Chris Wall

Yeah. I like what you said there, Peter. In particular, about whether you’ve been breached or you will be. I’ve heard it said three kinds of companies: those that have been breached, those that will be breached, and those that have been breached already and just don’t know it yet. Considering that, I think not having the data around is a safe approach. Let’s move on and talk about compliance briefly here. Let’s talk about it from an altruistic standpoint, that is a real benefit; we’re benefiting society. But practically speaking, what is the compliance benefit other than just complying with laws of being in compliance?

Esther Birnbaum

Yeah. Compliance benefit is an interesting phrase. It’s usually to stay in compliance.

Chris Wall

You’re not fined, right? We don’t like penalties; we don’t like fines.

Peter S. Hyun

Concur. Stay in compliance.

Esther Birnbaum

The more data you have, the harder it is to stay in compliance and respond to requests to obfuscate or erase data.

Chris Wall

I agree. Esther, I think what you’re saying is that the more data you have, the more expensive and difficult it is for an organization to remain in compliance. So, ultimately, there are real costs, both in time and money, expended in keeping you compliant, aside from the good that comes from being in compliance.

Esther Birnbaum

Yeah. And I think that people underestimate that cost. It is an incredible cost. We are generating enormous amounts of data these days. Our communications are across many different platforms. We’re just using more and more data. And as Tara mentioned earlier, we’re now generating enormous amounts of data. If you’re a company with an internal-facing GPT and your employees are just typing in queries, and if that data is being retained. There is a way to look at the big picture of company data to understand the cost, which is definitely underestimated. I think that a lot of what corporations run into or lawyers who are like, “Hey, you should be in compliance with these data retention and disposition rules,” is that nobody wants to spend money on it. So, the way that you can really engage upper management in a company is to really lay out the processing costs and storage costs. And well, as an in-house counsel, I might not care about that bottom line to the extent upper management does, but they do. And that’s how you’re going to get them to agree to be in compliance with it or have better data hygiene because it saves them lots of money.

Chris Wall

Yeah, Thanks, Esther. That’s a great segue into the overall cost associated with not just compliance but data minimization. Peter and Tara, is there anything you want to add about the cost savings, the real cost savings associated with minimization?

Tara Emory

Yeah. I like how Esther set that up. I’ve been on this ride long enough to have seen what motivated companies at different points in time to decide to finally go ahead with information governance programs that would involve a lot of focus on data minimization. And so, in the early 2010s, even the main focus was litigation readiness. If you get sued and investigated, you’re going to have all these extra costs, but that’s a reactive thing at the moment, and it’s really hard to get a proactive budget and get management on board with spending money on this thing. Then, security breaches became commonplace and were more of a focus, with investigations based on those. This motivated more uptake of this type of process. Then, we started having privacy laws. Companies understand well that you can be in or out of compliance with the law, and they are more willing to budget on it. We have seen that as there’s more and more data, the cost of data overall in hosting was cheaper, but the volumes were still increasing exponentially, and trying to manage it became, over the years, the longer you let it sit, the worse it was. And I think we are mostly at a point where everyone knows if it’s not dealt with, it’s a worsening problem that must be addressed. And behind all that is the overall efficiency and the numbers of how much the company spends or could save if it goes ahead with a program like this.

Chris Wall

Right, Tara. A former competition or recovering antitrust competition lawyer is here to talk about some of those efficiencies.

Tara Emory

Not quite, but thanks.

Esther Birnbaum

We don’t want to trigger any PTSD, though.

Tara Emory

I still work in that now. I’m back in a law firm and never gave it up from the discovery side. Antitrust eDiscovery is some of the biggest baddest. It’s actually what prompted me to move when I was a lawyer to take up a consulting role for a while because I just saw there was such a need for companies to focus on managing their data, and that companies who do manage their data have a much better time with a crazy second request. I know you’ll relate to that. Well, Chris as well.

Chris Wall

Too well. Don’t send me into the shakes. Okay. Well, Peter, I’m going to ask you, though, about the real benefits of minimization. Is there anything else you want to add related to compliance penalties or overall cost?

Peter S. Hyun

Well, I’ll just share, for example, just another short story because I think it goes to, I think, what everybody had talked about, but I think it puts it in concrete terms. We had an investigation that is now public because of the resolution, but it was the largest provider of what was known as telecom relay services, which provided real-time captions on voice calls for hearing and speech-impaired users. This great program served the public interest, and the company sought reimbursement for its captioning services. The programmatic rules prohibited the retention of such content information beyond the duration of the call. I can point you to where you can look, but the company had unwittingly retained the call content data for multiple years before discovering it. And on the backend, of course, it was again one of these where unscrambling the egg; I guess that process was incredibly long and arduous, but ultimately, there was a resolution reached where the company would implement a data retention schedule, data inventory, and corporate governance with respect to this. So I mentioned that just as far as again, having a perch where you’re able to see some of these thoughtful approaches that both Tara and Esther had mentioned being of the utmost priority notwithstanding where somebody might see or not see litigation risk or any other risk forthcoming.

Chris Wall

So Peter, let me ask you this. In that situation, in that scenario or case study you just gave us, how did you arrive at the period of time that that particular entity had to retain that data? We have a question here from one of our listeners: is there a centralized location that can give us a breakdown of my data retention or minimization requirements by jurisdiction or sector?

Peter S. Hyun

So, with that case in particular, it was founded in the regs with respect to the program because, again, the provider had sought reimbursement for providing that service, the real-time captioning. However, there were regulations in place that expressly prohibited the entity that sought these reimbursements from retaining that call content information beyond the duration of the call. Now again, ultimately, when the facts came to bear, there was this instance. I think some of it is in accordance with what we’ve been saying throughout, is that folks throughout the organization itself may not have been aware or knew, for example, both of what was happening behind the scenes with respect to the data, the mapping, the retention schedule, and the inventory. So unfortunately … I say unfortunately because it doesn’t directly answer this question, but this one was one where it was a specific example of one where I think minimization was, in fact, required, but because at the front end, the investments in thinking through how to have a retention schedule and inventory was not being exercised throughout.

Chris Wall

I’m going to thank Jacob for asking that question because it’s a great transition to number five on our screen: protection of individual privacy. We’ve got all the privacy pros on the call. And I’ll just throw it out for the three of you anyway. What is the standard from a privacy standpoint for retaining personal information? How long are you supposed to keep it? So I’ll start with the legal answer, and it depends, right?

Chris Wall

But the standard from a privacy standpoint, what is this period? Is it a day? Is it an hour?

Tara Emory

First, it depends on your jurisdiction and what you’re required to do. However, the general goal that you are looking for is to keep it for as long as the business requires that information. And you should be able to show that. You should be able to show why this type of information is important. For example, working in hospitality, there’s all kinds of information about people who stay in hotels. They have profiles that contain how many pillows they like, what room they like, and all kinds of personal information. It benefits the business and the customer to have that kept as long as it’s kept secure. But as long as that remains there, the customer has certain rights, potentially depending again on jurisdiction, to have that data treated, modified, and disclosed. Those are all considerations for setting up a program. Esther referred to that cross-jurisdictional tension, which relates to some very complicated processes in certain companies in trying to manage how this is done. It’s not always clear what the right answer is, but it relates to this question of doing a survey across all the jurisdictions you are operating in or have customers in, and seeing your high and low requirements. Do you have a minimum, or do you have a maximum? What are your other requirements? And gaming that all out. This is also why companies don’t want to do it. It’s a lot of work. It is resource-intensive to get it set up, but you can’t get away from this. Anyone who’s managed not to get some semblance of that in play at this point is either in major jeopardy or has it coming to them eventually. It’s something; the sooner it’s dealt with, the better. However, as I said before, privacy laws have been a real driver of information governance programs. It is because some jurisdictions have severe penalties for not complying, and it is so messy. There for us to see GenAI having a really big role to play in identifying and cleaning up personally identifiable information that exists in places that companies just weren’t able to know about it and even help us manage in compliance, it’s very early days, but I can definitely see that being useful in helping to subtract from what a burden this has been so far.

Chris Wall

Yeah. Thank you, Tara.

Esther Birnbaum

I will pick up there a little bit.

Chris Wall

Go ahead, Esther.

Esther Birnbaum

It’s a really excellent point. With GenAI, we’ve been doing a lot of testing on classifying data, which involves identifying and classifying PII. Depending on the company, anything with PII will be classified as sensitive and confidential. But GenAI’s capabilities are not just identifying PII and extracting PII but also applying the identifications to classifications and moving upstream, where you’re doing that with data at its source, is going to help that process later, whether it’s for discovery, privacy, or a DSAR request, you have to obfuscate. Your data is already identified and classified. So I think that the ultimate goal here, and as Tara talked about, like putting in a robust program for all of this, is going to be you really can leverage GenAI for results in a much more efficient way than we’ve seen in the past.

Chris Wall

That’s absolutely true, and I think we need to dive into that a little bit here because the rise of GenAI presents a lot of great opportunities for us in terms of identification, extraction, and then the ability to act, whether that’s obfuscation redaction of PII or be able to act upon it by notifying those individuals as part of an incident response or a breach response. So we’ve been beating the get rid of it drum pretty hard here. Well, the title of the webinar is less is more. But it might be important for us to pause and note that there are times when minimization must be minimized. And Tara, you touched on some of these situations earlier where our minimization efforts need to be tailored to the situation.

Tara Emory

You don’t want to delete data you were not supposed to. And so obviously, one is business value, and that will be what everybody thinks about. So that tends to protect itself pretty well because the business is already aware that some of its data has value and is visible to them, and they will plan what is defined as a record around that so that they can keep records for the appropriate amount of time. However, the less visible ones are the ones that come from legal compliance obligations. We’ve talked a lot here about the regulations that will require certain organizations to keep certain types of data for certain amounts of time. And then, of course, there’s legal hold. So, for an organization that does litigate, at least occasionally, I know there are organizations that are sure they’re never going to litigate, so maybe it doesn’t apply to them. But for anyone else, if you are not currently under a legal hold, there is no time like the present to work on data minimization because you don’t have to worry about legal holds and violating legal hold obligations in your data minimization project. As soon as you have a legal hold in place, you now have a requirement that you not delete anything that’s potentially responsive. And working around that when data is intermingled with sources is very difficult. It is done, I have done it, but it often requires having a lot of lawyers go in and separate the data and be sure that you’re not deleting something that later will turn out to have been, in retrospect, foreseeably relevant to the matter. And so once you’re under legal hold, you have to work around that. And it’s so important to respect that. And then we’ve mentioned contractual obligations as well. There may be understandings with customers or other parties that you will be holding on to some of their data for a certain amount of time, and you have to respect that as well.

Chris Wall

Yeah. Thanks, Tara. All I heard was a lot of lawyers, and that just started me thinking about expensive, expensive, expensive. So, it’s not an inexpensive process.

Tara Emory

No. It’s unavoidable if you’re always in litigation. There are large organizations that are never not on litigation holds, though the very, very large ones may not come along, hopefully for them every day, but they still can benefit from data minimization. You just have to find the right moment or balance in time for what that looks like for a specific person.

Chris Wall

When litigation ebbs, for instance, that would be a good time to engage in data minimization, when your litigation is ebbing, maybe, right?

Tara Emory

Yeah. That’s right.

Chris Wall

I mentioned earlier that we were going to do case studies, and I’d like to address these. And we’ve talked about it in the abstracts. And so we’ve got three great panelists here because they have real-world experience dealing with minimization. So, it might be good for each one of you to take maybe three minutes and talk about a real-world scenario where you’ve seen minimization in action. And we do have a question here, and I think, Esther, I think your scenario that you want to talk about deals with AI. And so, maybe you can address the question of how AI can be used there. But let’s lead off with you, Peter, to see if that’s all right with our first scenario.

Peter S. Hyun

Yep. I’m happy to. And really, I just wanted to give a case study of a case at the commission that we had resolved with a major wireless carrier that everybody would be familiar with following a string of four breaches back in 2020, ’22, and ’23. The reason why I think this is helpful is because of this question about how long you can hold onto data, et cetera; just to zoom out a little bit, as I had mentioned earlier, the Communications Act confers on telecom carriers in affirmative duty to protect the confidentiality of what is known as a customer proprietary network information. And again, there’s some litigation on the exact scope of that, but also a prohibition on unjust or unreasonable practices. What was interesting about the case was that the law confers these restrictions on telecom carriers, common carriers, from disclosing or permitting access to this customer’s proprietary network information without customer approval and outside of a specific business purpose. But again, what is absent here is this question about what you should do or what you should actually do if you don’t want the data or if you want to minimize it, so to speak. Well, what was interesting about the case, and I think what you see more industry-wide, is that the telecom sector is often an attractive target for threat actors because it has so much pattern of life information and also because of all the vulnerabilities from the complexity of its systems, from the decades-old legacy infrastructure to now and edge data centers, software-defined infrastructure, et cetera, what you’re seeing is these vulnerabilities, and then also there’s been some discussion about third parties and supply chain risks and vendors, all of those together led to a resolution where for the first time, the mobile wireless carrier agreed to a data minimization retention and deletion term. And it was the first time ever that we, as regulators, I say we, but my former employer had never entered into something like that. The way that it was then, again specified in the consent decree, was to limit the collection of consumer information to what is reasonably necessary for legitimate business or legal purposes and maintain policies that provide for that destruction. There was an anonymization, which we can go to later, but I mentioned it because it was an order that remained in effect for three years. In other words, the consent decree comes under the FCC order, and implementing those practices would include both investments. However, there were audits that could be performed both outside and by the commission. But I mentioned that just as a way in which I think folks were thinking about it, particularly when it came to the quality of the information, but also, again, making sure that there’s a dynamic conversation throughout the institution. And there were other things in that consent decree, including corporate governance and other types of cyber security requirements, including third-party assessments. But that was one case study that I just wanted to share with this group.

Chris Wall

Yeah. Thanks, Peter. Tara.

Tara Emory

So I think that the most impactful data minimization cases that I’ve been involved in are the ones that involve very extensive data mapping. The more complex and spread out the data, the more there is to go find. So, some organizations are constantly acquiring new companies. Sometimes, entire groups leave or get added. Whole things get stood up and shut down, or there’s just data sprawl, which is a lot of dark data. That creates huge amounts of risk because the organization doesn’t know what it has, it doesn’t know where it is, and it probably doesn’t need much of that, but some may be needed. So, you should work your way through to see who the person is most knowledgeable about each data source, what we can find out about it, and what should qualify as a record for the company based on all the types of information that they have. Then, when you can pull that all together and identify … And one of the ones I worked on, starting around maybe about 10 years ago or a little less, data lakes became very vogue. And it was like, “Just throw whatever it is in the data lake. Don’t worry about it. We’ll deal with it later.” And that for anyone who works in discovery knows that is a nightmare. But it’s the same issue when you’re dealing with data minimization. I worked on a project where we ultimately deleted about 30 million records from a company by gathering them and working our way through some of these immense sources. Now, I have always found that the justification from these kinds of companies was that, one day, we could use it for something. Actually, I’m blown away by GenAI, and it relates to this question over here: Well, now, who cares if we have all this data? We can just ask a question, and the robots will answer it. But there are still costs to that. Having a system run through that is not free all the time. And there’s still lots of risk. It doesn’t get around the risk answer. The fine issue is that the jury is out. Some organizations are doing it, but the general purpose models are evolving quickly, and they don’t need to learn from data. They need access to it. But there’s still a risk, even if you’re not always looking at it and skipping over it with your AI tool. So, I still think you shouldn’t do that, but some of the objections will be overlooked because of GenAI.

Chris Wall

All right. Thanks, Tara. I agree. Would you like to take a couple of minutes here, Esther?

Esther Birnbaum

Yeah. And I’m going to pivot to answer the question that was asked, which is about an LLM meeting to learn and improve privacy and compliance implications there. There are many ways to do this privately and comply with any security regulations you might have. The easiest example to use is if your data is in Microsoft, it’s in the cloud, and we can have a private instance or area of the cloud that’s closed off. It’s similar to an LLM; for example, if it’s there, you can close it off for your use, train the data, or that type of thing. But beyond that, the use case that I would throw out here is, let’s take a pharmaceutical company that has a ton of litigation, regular litigation, so regular document review and coding, et cetera. And part of that is coding for privacy and PII, et cetera. So, there is a lot of data already in your systems that is being used in discovery and coding, which could be helpful in the classification of data. Using GenAI, we can leverage the work that’s already been done to help train and learn about the rest of the data in your environment. So, I know that’s not really what the question asks. Still, it’s a really big use case we need to start talking about because training and classification work well with GenAI. Still, we can leverage the work product we already have to help train our classification systems. I think that’s going to be helpful moving forward because I think one of the biggest problems with any data regulation is, number one, you have to know where your data is, you have to know what your data is, and you have to know what’s in it. And there are many ways that we can leverage AI or GenAI to do that. So sorry, Chris. I pivoted.

Chris Wall

I was going to mention not just the data that you already have, but you can actually use your records of the data that you’ve disposed of to help train your system to train your model. It’s not just the data you have but the records of the data you’ve disposed of.

Esther Birnbaum

My last comment is I think that you’ll find that the lift to doing that is much smaller for classifying data than you would think.

Chris Wall

So we have some immediate steps here that our listeners can take to minimize data. And I know we are at the hour, and so I’m not going to go through each one of these, but they are in the materials that everybody will have available to them. We’ve talked about each one of these at some point during today’s webinar. We’ve talked about classification. We’ve talked about assessing what data you’re using. We’ve talked about looking at your high-risk data and what you’re going to do with all of that data to make sure that it’s in line with your privacy notice, your privacy policies and your document retention and disposition policies. But in addition to that, we have a couple of additional considerations here. As we leave our webinar today, I want to ask each one of our panelists, if there were one thing that our listeners could do, one thing that you could leave our audience with a few final thoughts, one thing that they could do today to move toward data minimization. And I’m going to start with you, Esther, then we’ll go to Peter and then Tara.

Esther Birnbaum

I think that the effort starts today and moving forward, you’re going to have to look back at all the records that you’ve retained before these regulations or before your program was in place, but the best place, I think, is to start right now. Don’t hesitate, because then you’ll just have more data to deal with later.

Chris Wall

Right. Thank you. Peter.

Peter S. Hyun

I love the point about data mapping and in particular in this era where there’s much more use of third parties and vendors, having access to data is ensuring that that front leg work on the data mapping is done with all of the relevant stakeholders and done with a level of both proficiency and fluency so that folks can know on the front end how they are planning for under the various legal frameworks and other obligations that they might have.

Chris Wall

Thanks, Peter. Tara, going to give you the last word here.

Tara Emory

I’d say evaluate your existing records and retention policy. And what you’re looking for, does it address disposition as well as retention? Is it modern and up to date, and makes sense for your organization, and is it being implemented as stated?

Chris Wall

Awesome. Thank you. I hope everybody wrote those things down. So again, we’re at our time. So many thanks to Tara, to Peter and Esther for sharing their insight over the past hour. Was that a wave? You want to say something, Esther?

Esther Birnbaum

No. I have to jump, but thank you, Chris. It was great.

Chris Wall

Well, many thanks to all of you for taking time out of your schedule to join today’s webcast. We really do value your time and appreciate you’re interest in this educational series and the good that you can do in your own sphere of influence by engaging in healthy data hygiene. So don’t miss our next webcast in two weeks on March 12th, “Discovering Data Quickly in High Stakes White Collar Investigations.” As It Happens, I’ve been asked to moderate that discussion as well, where our panelists are going to tackle some nuances and challenges that organizations face in white-collar investigations while at the same time trying to uncover evidence quickly, maintaining compliance with privacy concerns, and simultaneously doing that in multiple jurisdictions. So check out our website, haystackid.com to learn more about our webinars and to register for that upcoming webcast. And once again, thank you all for attending today’s webcast. We hope you have a great day. Thanks everybody.

Moderator

That wraps up our master class. Thank you all for joining us today. Special thanks to our speakers, Tara, Peter, Esther, and Chris for their time and efforts in preparing and delivering this session. As mentioned earlier, the session was recorded, and we’ll be sharing a copy of the recording with you later today. Thank you once again and enjoy the rest of your day.

Chris Wall

Thanks, Mouna. And thank you too. Sorry we had to rush there at the end. I really appreciate your flexibility. Fantastic preparation, guys. Thank you.

Moderator

Thank you.


Expert Panelists

+ Esther Birnbaum
EVP, Legal Data Intelligence, HaystackID

With a robust background in complex litigation and regulatory compliance, Esther brings a wealth of knowledge and practical experience to the table. She uses her unique expertise at the intersection of technology, data, and law to develop best practices and drive innovative workflows across many areas of the business. She enjoys sharing her insights with the wider eDiscovery community and frequently speaks at conferences, webinars, and podcasts on topics related to law and technology.


+ Tara Emory
Special Counsel, Covington & Burling LLP

An experienced legal technology lawyer, Tara Emory’s practice includes AI legal applications and processes, AI and Information Governance, and e-discovery. Her expertise bridges areas of law, data compliance, and technical aspects of data and software. In litigation, Tara advises clients on e-discovery issues including search methodologies, machine learning/TAR, data preservation and collection approaches, discovery protocols, and strategies for resolving discovery issues with litigation adversaries. She has managed e-discovery compliance in dozens of Hart-Scott-Rodino (HSR) Second Requests, with over two decades of experience that includes both providing legal advice as an attorney, and managing the Second Request operations for discovery service providers.


+ Peter S. Hyun
Law, Policy, and Investigations Expert, Formerly with the FCC | DOT | DOJ | NY AG | U.S. Senate

Peter S. Hyun recently served as the Acting Chief for the Federal Communications Commission Enforcement Bureau. Peter joined the FCC Enforcement Bureau after serving in the U.S. Department of Justice (DOJ) in a variety of senior roles, advising DOJ’s top leadership as Acting Assistant Attorney General in the Office of Legislative Affairs, where he worked to develop and implement strategies to advance the Department’s legislative initiatives, manage oversight obligations, and navigate executive nominations in matters before Congress. Before that role, Peter served as Chief of Staff to the Associate Attorney General, where he helped oversee the DOJ litigating components (including the Civil Rights Division, Antitrust Division, and Civil Division) and grantmaking components.


+ Christopher Wall (Moderator)
DPO and Special Counsel for Global Privacy and Forensics, HaystackID

Chris Wall is DPO and Special Counsel for Global Privacy and Forensics at HaystackID. In his Special Counsel role, Chris helps HaystackID clients navigate the cross-border privacy and data protection landscape and advises clients on technical privacy and data protection issues associated with cyber investigations, data analytics, and discovery. Chris began his legal career as an antitrust lawyer before leaving traditional legal practice to join the technology consulting ranks in 2002. Prior to joining HaystackID, Chris worked at several global consulting firms, where he led cross-border cybersecurity, forensic, structured data, and traditional discovery investigations.


About HaystackID®

HaystackID® specializes in solving complex data challenges related to legal, compliance, regulatory, and cyber events. Core offerings include Global Advisory, Data Discovery Intelligence, the HaystackID Core® Platform, and AI-enhanced Global Managed Review powered by ReviewRight®. Recognized globally by industry leaders like Chambers, Gartner, IDC, and Legaltech News, HaystackID prioritizes security, privacy, and integrity in its innovative solutions for leading companies and legal practices worldwide.

Assisted by GAI and LLM technologies.

SOURCE: HaystackID