AI Explained

What is AI bias and why does it keep happening?

AI bias is not a bug to be patched. It is built into how these systems are trained. Here is how it happens and how to use AI sensibly.

Ask an image generator for a photo of a doctor and, until relatively recently, you would almost always have been handed a white man in a lab coat. Ask for a nurse and the same system would politely return a woman. Nothing in the prompt suggested a gender or a skin colour. The model simply learned, from the pictures it trained on, what a doctor or a nurse tends to look like in the world it was shown, and it confidently served that pattern back as if it were a fact. That gap, between what the model has seen and what is actually true or fair, is the easiest way to understand AI bias.

The word itself trips people up. In everyday English, bias sounds like a personal prejudice, a choice someone makes. In machine learning it is more boring and more structural. A system is biased when its outputs systematically favour or disadvantage some groups, topics or outcomes in ways the people using it would not want. It does not require a villain at the keyboard. It is almost always the quiet accumulation of decisions about data, objectives and evaluation, each reasonable on its own, that combine to produce an outcome nobody explicitly asked for.

Bias enters AI at three main stages, and understanding each one helps explain why the problem keeps recurring even as models get bigger and smarter. The first is the training data. Large language models and image systems are built by feeding them vast libraries of text or pictures scraped from the internet, digitised books, news archives, forums and stock photo collections. Whatever skews already sit in that pile are now the system’s view of the world. If English news coverage disproportionately names men as chief executives and women as caregivers, the model will quietly internalise that as a pattern. If medical research is still dominated by studies on white male patients, a diagnostic tool trained on that literature will be sharper for some patients than for others. The data is the starting weather, and almost every other problem grows out of it.

The second stage is the objective the model is trained to optimise. Modern systems are not given a goal like “be fair.” They are given a measurable score, such as predicting the next word accurately, recommending content people click on, or approving loans that are repaid. These scores are proxies for what we actually care about, and proxies leak. A hiring model rewarded for matching “successful” past candidates will reproduce the demographics of who previously got hired, which in most industries is not a neutral starting point. A news feed rewarded for engagement will learn that outrage drives more clicks than nuance, and will tilt accordingly. The system is not malicious. It is doing exactly what it was asked. It is the asking that was incomplete.

The third stage is where human judgement gets baked in more directly, through reinforcement learning from human feedback and the reward models that sit behind it. Once a base model is trained, companies hire contractors to rate its answers, preferring some and rejecting others. A smaller model is then trained to predict those preferences and used to shape the main model’s behaviour. Whose preferences get collected, how the questions are phrased, which answers are treated as polite or impolite, confident or hedging, cautious or useful, all of it becomes part of the model’s personality. If the raters skew towards one culture or profession, the model’s idea of a good answer skews with them. If the guidelines tell raters to treat certain topics as settled, the model will later present those views as settled. Reinforcement learning is how a messy base model is turned into a polished product, and it is also how a particular set of tastes becomes invisible infrastructure.

Real examples are easier to remember than mechanisms. Amazon scrapped an internal recruiting tool in the late 2010s after discovering it was downgrading CVs that contained the word “women’s,” because the previous decade of engineering hires had skewed heavily male. Health algorithms used in American hospitals were shown to under-refer Black patients to specialist care because the model used historical healthcare spending as a proxy for medical need. Facial recognition systems have repeatedly performed worse on darker-skinned faces and on women, a problem documented at scale by researchers at MIT and later confirmed by the US National Institute of Standards and Technology across dozens of commercial systems. Each is a story about proxies and data, not about anyone deciding to be unfair.

Regulators in the UK and Europe are now treating this as a live consumer issue rather than a research curiosity. The Financial Conduct Authority has flagged algorithmic decisioning in lending, insurance and fraud detection as an area where firms must be able to explain outcomes and test for disparate impact on protected groups. The Information Commissioner’s Office has issued detailed guidance on fairness under UK GDPR, making clear that “the computer said so” is not a defence if a model produces discriminatory outcomes. The Equality and Human Rights Commission has urged public bodies using AI to document what they have done to check for bias before deployment. At the European level, the AI Act now classifies systems used in hiring, credit scoring, education and law enforcement as high risk, with specific obligations around data quality, logging and human oversight. None of this makes bias vanish. It does mean that a firm selling or using these tools in Britain can no longer shrug when asked how it tested them.

AI companies are responding with a mix of genuine progress and convenient marketing. The useful work includes more careful curation of training data, evaluations that test model behaviour across demographic slices rather than just overall accuracy, published model cards that describe known limitations, red-team exercises that deliberately probe for discriminatory behaviour, and methods like constitutional AI and representative human feedback that try to broaden whose preferences shape the model. The less useful version is the disclaimer stapled to the bottom of a chat window warning that outputs may contain errors, as if that discharges the responsibility. Anthropic, OpenAI, Google and Meta all now publish at least some of their fairness evaluations, and the gap between the best and the worst has narrowed, but no serious researcher claims the problem is solved.

For anyone actually using these tools, the practical takeaway is not paranoia but habit. Treat AI outputs the way you would treat advice from a confident but unfamiliar colleague who has read a lot and met almost no one. It is often right, it is occasionally wrong in predictable directions, and it rarely tells you which is which. When a model is helping with something consequential, a shortlist of candidates, a lending decision, a medical question, a legal summary, the correct default is to ask whether the same answer would come back if the person or case in front of you were different in ways that should not matter. If you cannot test that, assume the answer is provisional and keep a human in the loop. The machines are getting better. They are not yet fair enough to trust without looking.