The world is an unfair place. This is not news. The corny hyper-masculine protagonist faces a series of mysterious misfortunes, shrugs, and walks on unfazed, chanting the timeless mantra: "err, suck it up." As it is with all things in the human world, it is similarly true in the world of artificial intelligence — those human-trained, human-biased, thoroughly human machines.
Yes, the same snap judgements we pass in the cereal aisle, the quiet assumptions we carry like emotional hand luggage — they have, with admirable efficiency, been laundered into the very datasets that now make consequential decisions about people's lives. Remarkable, really.
"Bias, it turns out, does not politely wait at the door when the engineers go home."
Enter Microsoft, speaking loudly and clearly about fairness, armed with something called Fairlearn — a toolkit with two principal components: understand the mess, and then attempt to clean it up. The company is refreshingly candid that this is not merely a technological challenge. It is, they insist, a socio-technological challenge — which is academic shorthand for "deeply complicated and also everyone's fault."
The potential harms of a biased AI system are, as the authors catalogue, admirably varied. There is unfair allocation of opportunities, resources, and information — or, as it is known in human experience, simply: how things work when they are not working in your favour. There is differential quality of service across demographic groups, which will be familiar to anyone who has ever paid for a software subscription and discovered what the word "tier" really means.
Then there is the reinforcement of existing stereotypes, delivered now at machine speed. And beyond that, if you were hoping the list would stop at vaguely insulting, it does not: there is the generation of outright offensive content — what the documentation calls "denigration harms" and what the rest of us might call the cursing epidemic, or, in more generous moments, "the so-called words of wisdom." Finally, there is under-representation — treating certain groups as though they simply do not exist. Not unlike, the reader may note, a person one met on a dating application and subsequently never heard from again.
Fairlearn does, to its credit, come with a visualization dashboard — a tool to assess which groups will be negatively impacted by a given model, and to compare multiple models for both fairness and performance simultaneously. It covers regression and classification tasks. Its demographic metrics include demographic parity, equalized odds, worst-case accuracy rate, mean squared error, and worst-case log loss. Whether these terms mean anything to you is, the documentation implies, largely your problem.
"There are liars, there are statisticians, and then there are people who find a chart to prove their point — categories which, experience suggests, overlap considerably."
And crucially — refreshingly — the toolkit does not merely diagnose. It prescribes. Via post-processing algorithms and reduction algorithms, it takes a trained model's predictions, adjusts for demographic parity, reassesses the weights, and retrains. Again. And again. Ten, twenty iterations, nudging the numbers until the metrics behave. A neat little numbers game, one might observe — at which point one recalls the old distinction: there are liars, there are statisticians, and then there are people who simply find a chart that proves their point and pump it accordingly.
Where does Fairlearn fall short? Candidly and commendably, the authors tell you themselves. It cannot address stereotyping harms, denigration harms, over- or under-representation harms at a deep level. It does not particularly concern itself with the broader societal dimensions of fairness — justice, due process, that sort of civilisational nicety. The current capability, one notes, extends to groups: a reasonable cross-section of humanity defined primarily by measurable attributes. Individual fairness and counterfactual fairness, we are assured, are coming in a future release. One waits with the appropriate amount of patience.
Which brings us, inevitably, to you. Yes, you — the reader of this pile of considered analysis. What can you do? The answer, as with most genuinely hard problems, is: participate. Fairlearn is a community effort. The consequences of automated decision-making — of delegating judgement to machines, the question of who bears responsibility when the machine decides — do not resolve themselves. Someone has to care enough to look at the dashboard, interpret the metrics, and ask uncomfortable questions about whose experience was not accounted for in the training data.
The world is unfair. The models learned from the world. The models are, consequently, unfair. This is not a technical problem with a technical solution. It is a human problem with a technical assist — and the assist only works if the humans show up.