Bias in AI and Machine Learning: Sources and Solutions

  12 m, 6 s

“Bias in AI” refers to situations where machine learning-based data analytics systems discriminate against particular groups of people. This discrimination usually follows our own societal biases regarding race, gender, biological sex, nationality, or age (more on this later). Just this past week, for example, researchers showed that Google’s AI-based hate speech detector is biased against black people.

In this article, I’ll explain two types of bias in artificial intelligence and machine learning: algorithmic/data bias and societal bias. I’ll explain how they occur, highlight some examples of AI bias in the news, and show how you can fight back by becoming more aware.

A quick note on relevance: searching Google News for “AI bias” or “machine learning bias” returns a combined 330,000 results. Meanwhile, Google Trends shows a 300% increase in interest for these terms since 2016. This is no coincidence. Artificial intelligence is already at work in healthcare, finance, insurance, and law enforcement. But every month we hear new stories of biased AI and machine learning algorithms hurting people.

Graph: AI Bias Machine Learning Bias Google Trends

Google Trends for “AI bias” (blue) and “machine learning bias” (red)

Algorithmic AI bias, also known as data bias, is when data scientists train their AI with biased data. Societal AI bias is less-obvious, and even more insidious. Algorithmic bias and data bias tend to go hand-in-hand. Let’s explore these first.

Algorithmic Bias and Data Bias Explained

Humans are products of their experiences, environments, and educations. Similarly, artificial intelligence a product of its algorithms and the data it learns from. As Harvard Professor Vijay Janapa Reddi puts it, “I tend to think of [AI] bias very much as what the model has been taught.”

That’s a 1-in-3 failure rate for a task where you’d have a 50% chance of success just by guessing randomly.

Gender Shades, a project that spun out from an academic thesis, takes “an intersectional approach to product testing for AI.” In their original study, the University of Toronto’s Inioluwa Deborah Raji and MIT’s Joy Buolamwini tested demos of facial recognition technology from two major US tech giants, Microsoft and IBM, and a Chinese AI company, Face++.

Related article: (What is intersectionality? – YWCA Boston)

Among other takeaways, Raji and Buolamwini found that every instance of facial recognition technology they tested performed better for lighter-skinned faces than for darker-skinned faces. The reason? Of the two industry-benchmark facial analysis datasets they tested, IJB-A and Adience, both are “overwhelmingly composed of lighter-skinned subjects (79.6% for IJB-A and 86.2% for Adience).”

“The Black Panther Scorecard” showing how different facial recognition systems perform on characters from Marvel’s Black Panther – Joy Buolamwini on Medium

As part of their study, Raji and Buolamwini also examined three commercial gender classification systems. They found strong performance gaps between male and female faces. Darker-skinned females, for example, were misclassified up to 34.7% of the time, compared with a 0.8% error rate for lighter-skinned males. That’s a 1-in-3 failure rate for a task where you’d have a 50% chance of success just by guessing randomly.

Some results from Raji and Buolamwini’s study – from Medium

A Case Study in Algorithmic/Data Bias: Amazon Rekognition

In another poignant illustration of algorithmic AI bias, the American Civil Liberties Union (ACLU) studied Amazon’s AI-based “Rekognition” facial recognition software. The ACLU showed that Rekognition falsely matched 28 US Congress members with a database of criminal mugshots. According to the ACLU, “Nearly 40 percent of Rekognition’s false matches in our test were of people of color, even though they make up only 20 percent of Congress.”

Infographic from the ACLU’s study of Rekognition’s accuracy – from the ACLU

This isn’t the first time that researchers have shown severe bias in Rekognition: Raji and Buolamwini’s work, for example, included Amazon’s system. And, writing on Medium shortly after publishing their study, Buolamwini pointed out that, “Unlike its peers, Amazon did not submit their AI systems to the National Institute of Standards and Technology (NIST) for the latest rounds of facial recognition evaluations. Their claims of being bias free are based on internal evaluations.”

Maybe Amazon could use part of their $129 million tax rebate to work on fixing Rekognition. According to the ACLU, “To conduct our test, we used the exact same facial recognition system that Amazon offers to the public, which anyone could use to scan for matches between images of faces. And running the entire test cost us $12.33 — less than a large pizza.”

These implications of these findings are terrifying. Facial recognition systems discriminate against darker-skinned suspects, and are demonstrably unreliable at identifying female-presenting faces. Yet law enforcement are already using facial recognition tools to (try to) identify suspects.

Related article: How white engineers built racist code – and why it’s dangerous for black people – The Guardian

Despite Demonstrated Bias Problems, Amazon Isn’t Backing Down on Rekognition

In January and February, Amazon executives Matt Wood and Michael Punke published blog posts questioning Raji and Buolamwini’s work. But within a month, 26 of the world’s leading AI researchers signed an open letter categorically refuting Wood and Punke’s arguments and calling on Amazon to stop sales of Rekognition to law enforcement agencies. They join a coalition of 68 civil rights groups, hundreds of academics, more than 150,000 members of the public and Amazon’s own workers and shareholders.

But Amazon isn’t backing down. On August 15th, they announced that Rekognition can now detect fear. Or, as Gizmodo put it, “Amazon Rekognition Can Now Identify the Emotion It Provokes in Rational People“.

And despite documented algorithmic bias with potential to ruin thousands of lives, Amazon is “essentially giving away” facial recognition tools to police departments in Oregon and Florida. They’re actively courting departments in California and Arizona. And they’ve pitched Rekognition to Immigrations and Customs Enforcement (ICE), sparking mass protests.

Societal AI Bias: Insidious and Pervasive

Societal AI bias arises when an AI behaves in ways that reflect deep-rooted social intolerance or institutional discrimination. In these cases, the algorithms and data themselves may appear un-biased. But the output or usage of the system reinforces societal biases and discriminatory practices.

Societal bias in AI is difficult to identify and trace. The complexity is demonstrated by a 2014 study of Google Ads. Researchers found that “setting the gender to female resulted in getting fewer instances of an ad related to high paying jobs than setting it to male.”

Screengrab of Google Ads Demographic Targeting Help Guide – full source

The conspicuous at-fault party here is Google for allowing advertisers to target ads for high-paying jobs only to men. But companies choose to display ads in this way. In doing so, their actions reveal a societal bias towards assuming that men are better suited to these jobs.

So, do we criticize the advertisers for choosing to target ads this way, or do we blame Google Ads for allowing them to?

“…what’s wrong is that ingrained biases in society have led to unequal outcomes in the workplace, and that isn’t something you can fix with an algorithm.” Dr. Rumman Chowdhury, Accenture

A Healthcare Algorithm Used by “Almost Every Large Healthcare System” is Racist

In October this year, researchers uncovered a horrifying bias infecting an AI algorithm used by “almost every large health care system“. Their study, published in Science, found evidence of racial bias in an algorithm later reported by The Washington Post as old by healthcare tech giant Optum.

The purpose of the system is to help healthcare providers allocate patient care resources by flagging people with high care needs. As the Verge explains, the algorithm is based on data about how much it costs to treat a patient.

In theory, this metric is a substitute for how ill a patient is: more expensive to treat -> patient is more sick. In real life, however, unequal access to healthcare means that healthcare providers spend much less on black patients than similarly-sick white patients.

It’s safe to say that the algorithm’s trainers, who are probably white and male, didn’t account for how this institutional societal bias impacts their data. As a result, only 17.7% of black patients receive additional care directed by the algorithm. Correcting the bias would raise that number to 46.5%.

Read more in The Verge and The Washington Post.

Amazon Axes Their AI for Recruiting After It Demonstrates Societal Bias

Google isn’t the only tech company struggling with societal bias in their AI systems. Amazon made waves when they built and subsequently ditched an AI system meant to automate and improve the recruiting process for technical jobs. The project had lofty goals. As one Amazon engineer told The Guardian in 2018, “They literally wanted it to be an engine where I’m going to give you 100 résumés, it will spit out the top five, and we’ll hire those.”

Things didn’t go according to plan. Amazon realized their system had taught itself that male candidates were automatically better. Why? The algorithms they trained didn’t focus on coding ability and other IT skills. Instead, they favored candidates who described themselves using words that occur more frequently on male engineers’ resumes, including “executed” and “captured.” And they penalized résumés containing the word “women’s” and downgraded graduates of two all-women’s colleges.

The Societal Bias of Amazon’s AI was Inevitable

Amazon declined to comment on why this happened. But in short, the engineers trained their AI on résumés submitted to Amazon over a 10-year period. And then they benchmarked these résumés against current engineering employees.

Now, think about who applies to Amazon for engineering jobs. And who is currently employed on the engineering team? You guessed it: predominantly white men. This 2015 Seattle Times article shows that 64% of Amazon’s “non-laborer workforce” are white, and 75% of “professionals” are male.


Image sourced from The Seattle Times

Amazon’s self-reported 2018 data shows that 58.3% of their global employees are men, and 38.9% of their U.S.-based employees are white. Amazon’s data, however, includes all of their staff. The numbers, then, include warehouse staff who are more likely to be women and people of color.

Artificial intelligence can’t understand complex social context. So, from this data, Amazon’s AI learned that people with white- and male-looking features were the best fit for engineering jobs.

This is an example of societal AI bias in action: the data itself was technically clean; the algorithm seemed to be working in a logical way; but the output of the system reinforced misogynistic hiring practices.

The AI We Build Reflects Our Own Societal Bias

Bernard Marr, the international technology advisor and best-selling author, does a great job of summarizing in his January 2019 article, “Artificial Intelligence Has A Problem With Bias, Here’s How To Tackle It”.

On Forbes, Marr writes,

“In very simplified terms, an algorithm might pick a white, middle-aged man to fill a vacancy based on the fact that other white, middle-aged men were previously hired to the same position, and subsequently promoted. This would be overlooking the fact that the reason he was hired, and promoted, was more down to the fact he is a white, middle-aged man, rather than that he was good at the job.”

According to Dr. Rumman Chowdhury, Responsible AI Lead at Accenture, using historical data to train an AI (like Amazon did) is all-but-guaranteed to create problems. Speaking to Marr, Chowdhury points out that “you’re assuming that the only reason people are hired and promoted is pure meritocracy, and we actually know that not to be true.”

She continues, “So, in this case, there’s nothing wrong with the data, and there’s nothing wrong with the model, what’s wrong is that ingrained biases in society have led to unequal outcomes in the workplace, and that isn’t something you can fix with an algorithm.”

How to Fight Back Against AI Bias

Artificial intelligence is doing a lot of good in the world. But bias in AI corrupts well-intentioned projects and tangibly hurts thousands of people. Fight back by staying vigilant and not getting carried away by the hype.

Reducing bias in AI begins with you. Start by spreading awareness. Read articles like this and the pieces we’ve linked to below and then use your knowledge to educate others.

You need to be woke if you want your AI to be woke.

Challenge your own ideas about AI development. Instead of calling it an “arms race”, learn how AI is fostering international cooperation. And follow people like Yoshua Bengio, founder of the Montreal Institute for Learning Algorithms, who says, “If we do it in a mindful way rather than just driven by maximizing profits, I think we could do something pretty good for society.”

Meanwhile, encourage your own company to take responsibility for reducing AI bias. Executives need to understand the impact of AI bias and support their teams in their fight against it. Engineers and data scientists need to understand the sources of algorithmic/data bias so they can work to diversify their datasets. And everyone needs to be more aware of societal biases, so we can look for it in our own work.

Image: Teacher teaching a class of computers

And finally, lobby your government. The EU’s General Data Protection Regulation (GDPR) set a new standard for regulation of data privacy and fair usage. It’s a good start, but it’s not enough. AI regulation is lagging behind. The Financial Times writes that China and the United States are favoring looser (or no) regulation in the name of faster development. Some people are even giving up and arguing that AI regulation may be impossible.

So, write to your congresspeople, senators or other government representatives. Tell them to support stronger oversight of how artificial intelligence is trained and where it’s deployed. And follow groups like the AI Now Institute, who are already arguing for regulation of AI in sensitive areas like criminal justice and healthcare.

Further Reading on Bias in AI and Machine Learning

Artificial Intelligence Has A Problem With Bias, Here’s How To Tackle It

How white engineers built racist code – and why it’s dangerous for black people

What Unstructured Data Can Tell You About Your Company’s Biases

A.I. Bias Isn’t the Problem. Our Society Is

What is bias in AI really, and why can’t AI neutralize it?

‘A white mask worked better’: why algorithms are not color blind

Deepfakes Explained: What, Why and How to Spot Them

Some Artificial Intelligence Should Be Regulated, Research Group Says

To regulate AI we need new laws, not just a code of ethics

Stories of AI Failure and How to Avoid Similar AI Fails

Categories: Artificial Intelligence, Insights, Machine Learning