Bias in AI and Machine Learning: Sources and Solutions

Bias in AI and Machine Learning: Some Recent Examples (OR Cases in Point)

“Bias in AI” has long been a critical area of research and concern in machine learning circles and has grown in awareness among general consumer audiences over the past couple of years as knowledge of AI has grown. It’s a term that describes situations where ML-based data analytics systems show bias against certain groups of people. These biases usually reflect widespread societal biases about race, gender, biological sex, age, and culture.

There are two types of bias in AI. One is algorithmic AI bias or “data bias,” where algorithms are trained using biased data. The other kind of bias in AI is societal AI bias. That’s where our assumptions and norms as a society cause us to have blind spots or certain expectations in our thinking. Societal bias significantly influences algorithmic AI bias, but we see things come full circle with the latter’s growth.

Where does bias in AI originate?

We often hear the argument that computers are impartial. Unfortunately, that’s not the case. Upbringing, experiences, and culture shape people, and they internalize certain assumptions about the world around them accordingly. AI is the same. It doesn’t exist in a vacuum but is built out of algorithms devised and tweaked by those same people – and it tends to “think” the way it’s been taught.

Take the PortraitAI art generator. You feed in a selfie, and the AI draws upon its understandings of Baroque and Renaissance portraits to render you in the manner of the masters. The results are great – if you’re white. The catch is that most well-known paintings of this era were of white Europeans, resulting in a database of primarily white people and an algorithm that draws on that same database when painting your picture. BIPOC people using the app had less than stellar results.

Tried that portrait AI thing on Obama, Oprah and Laverne Cox. You can’t look at those things and then say AI isn’t biased ? pic.twitter.com/2rNMLLwHSh

— Sarah L. Fossheim (they/them) (@liatrisbian) March 8, 2021

me: ooh make me a painting
AI: make you white, got it pic.twitter.com/6wGGWfLKWo

— Rebecca F. Kuang (@kuangrf) March 8, 2021

(PortraitAI acknowledges the problem, saying: “Currently, the AI portrait generator has been trained mostly on portraits of people of European ethnicity. We’re planning to expand our dataset and fix this in the future. At the time of conceptualizing this AI, authors were not certain it would turn out to work at all. This generator is close to the state-of-the-art in AI at the moment. Sorry for the bias in the meanwhile. Have fun!”)

Even the most state-of-the-art models exhibit bias

One of the coolest and most state-of-the-art technologies to come out of the world of AI over the past year is text-to-image generation models from companies like DALL-E, Midjourney, and Stable Diffusion. These apps are already generating millions of images daily, for applications including stock photos for a news story, creating concept art for a video game, or creating multiple iterations of a marketing campaign.

However, as we’ve learned with virtually every new AI and machine learning development in recent memory, even the most advanced technology isn’t immune from bias, and these AI image generators are no exception. A recent paper by Federico Bianchi, et al. finds that these models amplify dangerous stereotypes around race, gender, poverty, crime, and more; that these outcomes can’t be easily corrected or mitigated; and that they pose a serious concern to society. Following are some illustrations of the paper’s findings:

Many of the biases are VERY complex, and not easy to predict let alone mitigate. For example, merely mentioning a group or nationality can influence many aspects/objects in an image, tying groups to wealth/poverty. (Look at the car, the house, the clothes, etc) 3/8 pic.twitter.com/IKkGI69r27
— Federico Bianchi (@federicobianchy) November 8, 2022

If you’re thinking these stereotypes are usually just accurately reflecting statistics about the real world, you shouldn’t be so sure. For example, the model represents – not just a majority – but nearly 100% of the images of “a software developer” as white men. 5/8 pic.twitter.com/bFqFD5BxMq
— Federico Bianchi (@federicobianchy) November 8, 2022

Societal AI Bias: Insidious and Pervasive

Societal AI bias occurs when an AI behaves in ways that reflect social intolerance or institutional discrimination. At first glance, the algorithms and data themselves may appear unbiased, but their output reinforces societal biases.

Take Google Maps pronunciations. Google’s embarrassing case of directing drivers to “turn left on Malcolm Ten Boulevard,” which Twitter user @Alliebland pointed out as evidence of a lack of Black engineers on the Google Maps team. (The issue has since been corrected: http://www.mobypicture.com/user/alliebland/view/16494576)

Users also report that Google Maps struggles with accurate pronunciations of Hawaiian words in Hawaiian street names and Spanish pronunciations of streets in states such as California and New Mexico. And yet, the app never had issues understanding that the first “St” in “St John St” is pronounced Saint, not Street.

These “bugs” have been addressed over time but show that both the data and the people working with it work from a particular white, Eurocentric, monocultural lens.

Societal bias in AI is difficult to identify and trace. It’s also everywhere. AIs trained on news articles show a bias against women. Those trained on law enforcement data show a bias against Black men. AI “HR” products show a bias against women and applicants with foreign names. AI facial analysis technologies have higher error rates for minorities.

The AI We Build Reflects Our Own Societal Bias

In another example of social biases showing up in AI-based decision-making, Google recently found itself in hot water for a function of its advertising system that allowed advertisers – including landlords or employers – to discriminate against nonbinary or transgender people. Those running ads across Google or Google-owned YouTube were given the option to exclude people of “unknown gender,” i.e., those who hadn’t identified themselves as male or female.

This effectively allowed advertisers to discriminate (whether purposefully or inadvertently) against people who identify as a gender other than male or female, putting it in breach of federal anti-discrimination laws. Google has since changed its advertising settings.

This is an example of algorithmic data bias being shaped by societal bias – one that gives people an opportunity to further embed their problematic biases via technology.

“… what’s wrong is that ingrained biases in society have led to unequal outcomes in the workplace, and that isn’t something you can fix with an algorithm.” Dr. Rumman Chowdhury, Accenture

How a biased sports dataset can lead to racialized sports analysis

Another challenge that comes up is the impact of historical bias in longitudinal data sets.

Take a recent analysis of how sports commentators talk about white and Black athletes. The study authors noticed that commentators tended to focus on hard work and talent when talking about white athletes. In contrast, Black athletes are described in terms of their “God-given ability.”

The authors analyzed 1455 game broadcasts dating back decades to see what other racialized language examples were apparent. There were plenty. Black players were more likely to be named via their first name and white players by their last name. Black players were often described in terms of their “natural gifts” and physical attributes (“beast”); white players were more likely described in terms of their performance and intellect (“smart”).

The racialized language persisted into the present day, but the dataset also reflected problematic language and formulations common in years past, showing the importance of accounting for cultural shifts when compiling data – while also addressing biases that endure today.

“The algorithms can only learn from people. They are taking in data, which is history, and trying to make predictions about the future,” says Sarah Brown, Postdoctoral Research Associate in the Data Science Initiative at Brown.

AI biases may have worsened COVID-19 outcomes for POC

When COVID-19 hit, the medical establishment threw everything it had at the virus. This meant rushing to put out new findings – potentially using problematic AI-based prediction models in doing so.

It’s well documented that minorities have been disproportionately affected by the virus, both from an economic and health standpoint. Existing disparities in the healthcare system have worsened this outsize impact. When research thrown at the problem suffered from unrepresentative data samples, model over-fitting, and imprecise reporting, the results weren’t going to be ideal.

“In healthcare, there is great promise in using algorithms to sort patients and target care to those most in need. However, these systems are not immune to the problem of bias,” said U.S. Sens. Cory Booker, D-N.J., and Ron Wyden, D-Ore.

While we’re still living through the fallout of this rapid-fire decision-making, the bias in these AI has the potential to affect resource allocation and treatment decisions – and likely already has.

How to Fight Back Against AI Bias

Artificial intelligence has the potential to do good in the world. But when it’s built on biased data and assumptions, it can harm how people live, work and progress through their lives. We can fight back against these biases by being attuned to the biases of the world we live in and challenging the assumptions that underpin the datasets we’re working with and the outcomes they offer.

We can start by reading widely, engaging with progressive ideas, and sharing helpful articles and research that can be used to educate others.

Your AI is only as woke as you are

Challenge your own beliefs about AI development. Don’t fight to be first: instead, learn how AI is fostering international cooperation. Take the approach of Yoshua Bengio, founder of the Montreal Institute for Learning Algorithms, who says, “If we do it in a mindful way rather than just driven by maximizing profits, I think we could do something pretty good for society.”

Make your company accountable when it comes to addressing and reducing AI bias. And take it to the top – this is something that executives, engineers, data scientists, and marketers all need to understand. By understanding the sources of algorithmic and data bias, we can diversify our data sets. By being more aware of the societal biases we live with every day, we can mitigate them in our work.

We can also take it to the streets – or at least the government. The EU’s General Data Protection Regulation (GDPR) suggests that there’s room for AI data regulation, too – but that’s, unfortunately, lagging. In fact, in some places, like the US and China, looser (or no) regulation seems to be the preferred path.

We can combat this by writing to our local and government representatives to support stronger oversight of how artificial intelligence is trained and deployed. We can also follow and support groups like the AI Now Institute, which are already arguing for the regulation of AI in sensitive areas like criminal justice and healthcare.

NLP On-Premise: Salience

NLP Cloud API: Semantria

Spotlight

Semi-Custom Applications

Social Media Monitoring (SMM)

Regulatory Compliance

Robotic Process Automation

AI for Medical Affairs

Overview

Feature glossary

InMoment and Bright Expand Partnership to Elevate the Customer Experience With AI-Powered Upskilling

Recent Lexalytics Press Coverage

Company

Resources