In many ancient cultures people believed that everything had a spirit. Each human, animal, plant, and even each rock and bolt of lightning were understood as we understand ourselves. If we eat because we feel hunger, so must an animal feel something similar when it eats. We still intuit this today (it’s called empathy), so we can’t judge our ancestors too harshly for thinking lightning must feel something similar to anger when it harms one of us.
Today, we have a tendency to apply this kind of thinking to computers and algorithms. After all, computer programming is the spooky voodoo of the day, especially machine learning where even the wizards cannot claim to fully understand it. Absent a good framework to guide our intuitions, we do what humans do and empathize, anthropomorphizing the algorithms. We say, “the algorithm thinks X” or “the algorithm wants X” as if computers are rocks we tricked into thinking with lightning. (Aren’t they? And is that mean to the rocks?)
This isn’t always problematic—finding ways to relate new ideas back to concepts we already understand is often a critical component of human learning. However, sometimes intuitions about “AI” algorithms are misguided, people imagine software works in ways it doesn’t, and users end up with disastrous results simply because they fell into the trap of anthropomorphizing something they don’t understand. Algorithms certainly don’t think the way we do, nor can they examine their own thinking the way we do. Just because you can figure something out doesn’t mean the AI can.
For example, consider an eDiscovery use case where attorneys are trying to train a model (or series of models) to find emails related to allegations specified in a civil complaint. Attorneys look at various documents and label them as responsive or not. The algorithm is here to help! It can read text and will make informed guesses on the labels. The algorithm should learn that attorneys want to find all the emails about a particular product but are not interested in emails about football or opera.
Imagine an email Bob sent Charlie that says, “Do you want to grab lunch at noon?” Through past experience, The Algorithm predicts this document has nothing to do with the case. Most emails that Bob sent Charlie have nothing to do with the product, no other emails about lunch have anything to do with the product, and all other meetings related to the product took place at 10:00 AM. It has seen dozens of other emails from Bob to Charlie with almost the exact same text that were labeled irrelevant.
Every attorney reviewing documents for this case knows that Bob and Charlie came up with the idea for the relevant product at that particular lunch meeting, so this email is indisputably responsive to opposing counsel’s document requests. Stupid algorithm! Duh, this document is obviously relevant. Everyone knows that—didn’t it listen to Bob telling us this last week?
(But before we bash the algorithm too badly, note that a keyword search for the product name would never have caught this relevant document either.)
We don’t realize that (today’s) AI does not have access to all the data we do. The data we have access to is much more diverse. We have our entire life experience and all the inferences we can draw from that. We have access to the collective knowledge of our colleagues—a small expression can tell you if a statement you made was interesting, doubtful, or taboo. AI might have learned from “big data,” but imagine if you could only learn from words and no pictures, feelings, or experiences. It would be a bit harder to contextualize. And without context they’re easily confused by inconsistent data. (For the record: I’m right there with you, algorithm.) We are advantaged with our ability to think about thinking. We can gather more evidence when things seem contradictory, evidence from sources—like talking to people—that AI cannot access.
Another mistake would be pedagogical oversimplification. Unlike children, AI does not benefit from oversimplification, intentionally mislabeling your data to “simplify” the task is subtle and likely counterproductive.
Consider a well-intentioned AI consultant cautioning reviewers to label documents on the four corners of the document—she explains that the AI will be confused if attorneys tell it a document is relevant for reasons it can’t understand. These emails about lunch are a conundrum! They don’t have the product name anywhere in them, so how would the AI understand that they’re relevant? The attorney may think that they should teach the AI that these emails aren’t relevant even though the attorneys know they are. Oops—if most of the emails about lunch were relevant, the AI would have figured out it was an important word. Instead, we confused it by thinking about our own thinking.
Now, let’s imagine a slightly different scenario where Bob and Charlie frequently met for lunch to discuss the product. Most of the reviewers were paying attention when the case team explained that they are legally obligated to produce these kinds of documents. Even though a few attorneys labeled some of these emails not responsive, most of the team got them right and the algorithm keeps insisting that they are responsive. Good algorithm! It’s helping to correct human mistakes.
(And who would have guessed that the search term “Bob AND Charlie AND lunch” would have yielded so many responsive documents?)
But of course, the algorithm is not a conscious being with goals, desires, or critical thinking skills. It was designed and engineered for a specific purpose, and many factors inform its limitations. If we must imagine AI algorithms as sentient creatures, try to keep the following in mind:
- They can only learn from examples. Sometimes they need lots of examples. If too many examples are inconsistent or contradictory, they get confused.
- They cannot pay attention to evidence that’s not actually in the examples given to the algorithm. If it is not in the dataset or if developers don’t explicitly feed it in, the algorithm cannot use it.
- They should be tested and evaluated every time you use them on new data. It won’t necessarily work well this time just because it worked well last time. And as always, pay attention to the confidence interval, not the point estimate.
Please don’t take any of this too literally. There are lots of learning algorithms and different approaches to feature extraction. Every product works differently, and it’s important to understand what works and what doesn’t for the product you’re using. Hopefully, developers will catch up and find ways to predict almost anything. Then humans only need to worry about being consistent with each other and with themselves. (Why is that so hard for us? Dear economists, how rational are we really?)