Stories of AI Failure and How to Avoid Similar AI Fails

Don’t fail prey to the AI hype machine. These stories of AI failure are alarming for consumers, embarrassing for the companies involved, and an important reality-check for us all. This article includes stories of recent, high-profile AI fails, as well as information and advice on how to avoid your own AI failure:

AI Failures From IBM, Microsoft, Apple and Amazon
“9 More Ways to Fail With AI” by the Chief Data Officer at Abe.ai
Why Maintenance is Critical to Avoiding an Embarrassing AI Failure
How to Get Real Value from Artificial Intelligence

Full disclosure if you’re new to Lexalytics: we provide a software platform that uses AI and machine learning to help people analyze text documents, including tweets, reviews and contracts. But the stories and the advice presented here are relevant for anyone involved in AI/machine learning – and anyone else, really.

Fail: IBM’s “Watson for Oncology” Cancelled After $62 million and Unsafe Treatment Recommendations

No AI project captures the “moonshot” attitude of big tech companies quite like Watson for Oncology. In 2013, IBM partnered with The University of Texas MD Anderson Cancer Center to develop a new “Oncology Expert Advisor” system. The goal? Nothing less than to cure cancer.

The first line of the press release boldly declares, “MD Anderson is using the IBM Watson cognitive computing system for its mission to eradicate cancer.” IBM’s role was to enable clinicians to “uncover valuable insights from the cancer center’s rich patient and research databases.”

So, how’d that go?

“This product is a piece of sh–.”

In July 2018, StatNews reviewed internal IBM documents and found that IBM’s Watson was making erroneous, downright dangerous cancer treatment advice.

According to StatNews, the documents (internal slide decks) largely place the blame on IBM’s engineers. Evidently, they trained the software on a small number of hypothetical cancer patients, rather than real patient data.

The result? Medical specialists and customers identified “multiple examples of unsafe and incorrect treatment recommendations,” including one case where Watson suggested that doctors give a cancer patient with severe bleeding a drug that could worsen the bleeding.

From this Verge article:

“This product is a piece of s—,” one doctor at Jupiter Hospital in Florida told IBM executives, according to the documents. “We bought it for marketing and with hopes that you would achieve the vision. We can’t use it for most cases.”

In February 2017, Forbes reported that MD Anderson had “benched” the Watson for Oncology project. A special report from University of Texas auditors said that MD Anderson had spent more than $62 million without reaching their goals (previously linked to https://www.utsystem.edu/sites/default/files/documents/UT%20System%20Administration%20Special%20Review%20of%20Procurement%20Procedures%20Related%20to%20UTMDACC%20Oncology%20Expert%20Advisor%20Project/ut-system-administration-special-review-procurement-procedures-related-utmdacc-oncology-expert-advis.pdf).

Fail: Microsoft’s AI Chatbot Corrupted by Twitter Trolls

Microsoft made big headlines when they announced their new chatbot. Writing with the slang-laden voice of a teenager, Tay could automatically reply to people and engage in “casual and playful conversation” on Twitter.

Some of Tay’s early tweets, pulled from this Verge article:

@HereIsYan omg totes exhausted.
swagulated too hard today.
hbu?

— TayTweets (@TayandYou) March 23, 2016

@themximum damn. tbh i was kinda distracted..u got me.

— TayTweets (@TayandYou) March 23, 2016

@ArtsRawr like some og kush dank

— TayTweets (@TayandYou) March 23, 2016

Tay grew from Microsoft’s efforts to improve their “conversational understanding”. To that end, Tay used machine learning and AI. As more people talked with Tay, Microsoft claimed, the chatbot would learn how to write more naturally and hold better conversations.

Microsoft won’t say exactly how the algorithms worked, of course. Perhaps because of what happened next.

Less than 24 hours after Tay launched, internet Trolls had thoroughly “corrupted” the chatbot’s personality.

By flooding the bot with a deluge of racist, misogynistic, and anti-semitic tweets, Twitter users turned Tay – a chatbot that the Verge described as “a robot parrot with an internet connection” – into a mouthpiece for a terrifying ideology.

Microsoft claimed that their training process for Tay included “relevant public data” that had been cleaned and filtered. But clearly they hadn’t planned for failure, at least not this kind of catastrophe.

After a cursory effort to clean up Tay’s timeline, Microsoft pulled the plug on their unfortunate AI chatbot.

Fail: Apple’s Face ID Defeated by a 3D Mask

Apple released the iPhone X (10? Ten? Eks?) to mixed, but generally positive reviews. The phone’s shiniest new feature was Face ID, a facial recognition system that replaced the fingerprint reader as your primary passcode.

Apple said that Face ID used the the iPhone X’s advanced front-facing camera and machine learning to create a 3-dimensional map of your face. The machine learning/AI component helped the system adapt to cosmetic changes (such as putting on make-up, donning a pair of glasses, or wrapping a scarf around your neck), without compromising on security.

But a week after the iPhone X’s launch, hackers were already claiming to beat Face ID using 3D printed masks. Vietnam-based security firm Bkav found that they could successfully unlock a Face ID-equipped iPhone by glueing 2D “eyes” to a 3D mask. The mask, made of stone powder, cost around $200. The eyes were simple, printed infrared images.

Bkav’s claims, outlined in a blog post, gained widespread attention, not least because Apple had already written that Face ID was designed to protect against “spoofing by masks or other techniques” using “sophisticated anti-spoofing neural networks”.

Not everyone was convinced by Bkav’s work. Publications such as Wired had already tried and failed to beat Face ID using masks. And Wired’s own article on Bkav’s announcement included some skepticism from Marc Rogers, a researcher for security firm Cloudflare. But the work – and this glimpse into the weakness of AI – is fascinating.

Fail: Amazon Axes their AI for Recruitment Because Their Engineers Trained It to be Misogynistic

Artificial intelligence and machine learning have a huge bias problem. Or rather, they have a huge problem with bias. And the launch, drama, and subsequent ditching of Amazon’s AI for recruitment is the perfect poster-child.

Amazon had big dreams for this project. As one Amazon engineer told The Guardian in 2018, “They literally wanted it to be an engine where I’m going to give you 100 résumés, it will spit out the top five, and we’ll hire those.”

But eventually, the Amazon engineers realized that they’d taught their own AI that male candidates were automatically better.

How did this AI fail happen? In short, Amazon trained their AI on engineering job applicant résumés. And then they benchmarked that training data set against current engineering employees.

Now, think about who applies for software engineering jobs. And who is most-likely to be currently-employed in software engineering? That’s right: white men.

So, from its training data, Amazon’s AI for recruitment “learned” that candidates who seemed whiter and more male were more-likely to be good fits for engineering jobs.

That’s the short version – the full story is even more painful. Our article on bias in AI and machine learning has more.

Fail: Amazon’s Facial Recognition Software Matches 28 U.S. Congresspeople with Criminal Mugshots

Amazon’s AI fails don’t stop there. In 2018, the American Civil Liberties Union showed how Amazon’s AI-based Rekognition facial recognition system

According to the ACLU, “Nearly 40 percent of Rekognition’s false matches in our test were of people of color, even though they make up only 20 percent of Congress.”

ACLU-infographic-showing-racial-bias-in-Amazon-face-recognition

Infographic from this article at ACLU.org

In fact, that’s not even the first time someone’s proven that Rekognition is racially biased. In another study, University of Toronto and MIT researchers found that every facial recognition system they tested performed better on lighter-skinned faces. That includes a 1-in-3 failure rate with identifying darker-skinned females. For context, that’s a task where you’d have a 50% chance of success just by guessing randomly.

This is, of course, horrifying. It’s not even an “AI fail” so much as a complete failure of the systems, people and organizations that built these systems.

I wish I could say that, faced with incontrovertible proof that they did a bad thing, Amazon did what they needed to fix their AI bias. But the story doesn’t end here. Law enforcement agencies are already trying to use tools like Rekognition to identify subjects. And despite these demonstrated failures – it’s algorithmic racism, really – Amazon isn’t backing down on selling Rekognition.

Seriously, just read this article from The Guardian: How white engineers built racist code – and why it’s dangerous for black people

Real Quick: 5 More AI Fails

Microsoft and Apple aren’t the only companies who’ve made headlines with embarrassing AI fails. In this feature, Srishti Deoras summarizes the “top 5 AI failures from 2017“.

In one story, Facebook had to shut down their “Bob” and “Alice” chatbots after the computers started talking to each other in their own language. And that’s just the beginning. Srishti continues with more examples from Mitra, Uber and Amazon.

Together, these 5 AI failures cover: chatbots, political gaffs, autonomous driving accidents, facial recognition mixups, and angry neighbors.

Srishti argues that these failures suggest companies should be more cautious and diligent when implementing AI systems.

9 More Ways to Guarantee an AI Fail

Writing on Medium, Francesco Gadaleta, Chief Data Officer at Abe.ai, explores 9 more “creative ways to make your AI startup fail“.

Francesco’s list is comprehensive, funny, and thought-provoking. It features some classic paths to failure, such as “Cut R&D to save money” and “Work without a clear vision”. But, Francesco says, “there is a plethora of ways to fail with AI”.

My favorite is #2, “Operate in a technology bubble.”

As Francesco points out, AI doesn’t always fail due to technical problems. Sometimes, the problem is a lack of social need or interest.

“Artificial intelligence technologies cannot be built in isolation from the social circumstances that make them necessary,” Francesco writes.

This is a fantastic point. In the rush to stay ahead of the technology curve, companies often fail to consider the impact of their inherent biases. This is particularly dangerous for companies working in data analytics for healthcare, biotechnology, financial services and law.

Just look at Watson for Oncology’s data bias and lack of social context doomed that AI project to failure and sent $62 million down the drain.

“Operating in a bubble and ignoring the current needs of society is a sure path to failure.” – Francesco Gadaleta

Francesco’s list is a must-read for any executive, developer or data scientist looking to add AI to their technology stack

Why Maintenance is Critical to Avoiding an Embarrassing AI Failure

Plan for failure; work on your reaction times; adopt a change management model. Manifesto of a management consulting firm? No, it’s veteran data scientist Paul Barba writing for KDnuggets.

Just like a car, Paul explains, an AI can tick along for a while on its own. But failing to maintain it can destroy your project or product, and maybe even your company.

As cars become more complex, insurance companies advise owners to keep up with preventative maintenance before the cost of repairs becomes staggering. Similarly, as an AI grows more complex, the risks and costs of AI failure grow larger. And the longer you wait to repair your AI, the more expensive it’ll be.

“Through auditing, quantitative measuring and proactive organizational responsiveness, you can avoid the equivalent of blowing an AI gasket.” – Paul Barba

Just like your car, an AI requires maintenance to remain robust and valuable. And just like your car, you may be faced with a sudden, catastrophic failure if you don’t keep it up-to-date.

In this article, Paul explains how data scientists can avoid AI failure by maintaining it with new training data, methods and models.

How to Get Real Value from Artificial Intelligence in 2020 and Beyond

Big AI projects, such as Watson for Oncology and self-driving cars, get most of the press coverage. But as the past few years have shown, moon-shots like these are the most likely to fail. And when they fail,they fail spectacularly (as we’ve been discussing).

Related article: How to Choose an AI Vendor

How, then, can you build an AI system that actually succeeds? The answer is deceptively simple:

Focus on solving a real business problem.

Our own CEO, Jeff Catlin, has spent the past 19 years watching AI and machine learning get over-hyped and under-delivered. In this article on Forbes, he examines a number of business applications for AI solutions to:

Predict customer churn
Create better surveys
Read and handle online reviews
Craft effective messaging

“Building a business case for AI isn’t so different from building one for any other business problem,” Catlin writes. “First, identify a need and a desired outcome (automation and efficiency are common drivers of successful AI projects). Then undertake a feasibility assessment.”

The key is to look for business use cases where AI is already in action, or where it’s emerging as an effective solution.

Jeff puts it best: “With the right business case and the right data, AI can deliver powerful time and cost savings, as well as valuable insights you can use to improve your business.”

Read Jeff’s article on Forbes: Using AI to Solve a Business Problem

Further Reading on AI Best Practices and AI Applications

Artificial Intelligence for Disaster Relief

AI in Healthcare: Data Privacy and Ethics Concerns