Turkish lacks gendered pronouns: The single word “o” does the work that in English is done by “he,” “she,” or “it.” That linguistic quirk poses a challenge for machine-translation tools: to render a Turkish sentence into English, a tool like Google Translate must guess its subject’s gender — and in the process, often betrays its own built-in biases.

For example, Google translates the Turkish sentence “o bir doktor” as “he is a doctor” and the grammatically identical “o bir hem?ire” as “she is a nurse.” Google’s algorithms similarly assume that a president or entrepreneur is male, but that a nanny, teacher or prostitute is female. Even character traits come with assumed genders: A hardworking person is judged to be male, while a lazy one is assumed to be female.

Google’s coders didn’t program those stereotypes; the software figured them out on its own, using AI to pore over real-world texts and create translations that mirror the way people actually use language. Google Translate’s biases, in other words, are our own. “Machine-learning algorithms pick up on patterns which are biased, and learn to be just as biased as the real world,” Microsoft Research’s Rich Caruana says.

That should give pause to HR managers who believe that algorithms are set to rid the recruitment process of prejudice. A 2016 article in CIO predicted that AI would transform hiring into “a process immune to any human biases.” A pocket calculator can’t be racist, the theory went, and an automated system for sorting resumes ought to be just as impartial: “By simply using an automated, objective process like this, it’s possible to drastically reduce the scope for human bias.”

It’s easy to see why managers yearn for a solution to bias. It’s been decades since the U.S.’s Civil Rights Act and the U.K.’s Race Relations Act banned overt discrimination, but less-explicit biases remain a major problem. Research shows that on average, resumes bearing a stereotypically African-American name are 36 percent less likely to receive a callback than an identical resume submitted under a white-sounding name — and that hasn’t changed since the 1970s, sociologist Lincoln Quillian says.

It’s not that managers are consciously discriminating against people, says Quillian, who last year published an analysis of 30 recruitment-bias studies involving 55,842 applications. Instead, decision-makers are swayed by unconscious biases, grounded in deep-seated prejudices they probably aren’t aware of. “Stereotypes are a resilient thing that are slow and difficult to change,” Quillian says. “That’s kind of the depressing reality — over 35 years, it hasn’t changed.”

Inside the ‘Black Box’

The growing awareness of unconscious bias has fueled interest in automated hiring systems, but the pervasiveness of such bias means that AI tools, trained using real-world data, are anything but impartial. Google Translate’s gender problem isn’t an isolated case. There are countless examples, across a range of industries, of machine-learning tools replicating the prejudices of their flesh-and-blood masters.

Consider that researchers found that facial-recognition tools were 99 percent accurate for white men but only 65 percent accurate for black women. And that Google’s online-ad algorithms were discovered to be more likely to show advertisements related to arrests to people with black-sounding names. And an AI tool widely used by judges was found to wrongly label black defendants as being at high risk of reoffending at almost twice the error rate that it had for white defendants.

It’s easy to see how such biases could contaminate tools that match applicants to job vacancies or that inform decisions relating to bonuses and promotions. An AI tool scrutinizing applications for a programming position might well notice that a company’s current programmers are mostly men, and give a corresponding boost to male candidates. “You’ll wind up not recruiting women as aggressively for programming positions, because the bias in the sources is that women, by and large, aren’t programmers,” Caruana says.

And while AI tools can be ordered to ignore gender, race and other protected traits, eliminating bias from AI systems is harder than it looks. By their nature AI tools spot patterns, and many factors — ZIP codes, education, hobbies — can inadvertently become proxies for traits such as race or gender.

In one recent case, the recruitment service Gild boasted that its AI system had figured out how to spot high-potential computer engineers: Such candidates, it turned out, were likely to be fans of a Japanese manga website. Favoring candidates who liked manga, Gild’s team theorized, would help it identify high-quality engineers.

The problem, as data scientist Cathy O’Neil notes, is that there’s no causal connection between loving manga and being a successful engineer. Instead Gild had identified a trait that correlated with race and gender, effectively using a back-door approach to weed out people who didn’t fit its preconceptions about what a successful candidate should look like. “It was amazing to me it didn’t occur to them that that candidate who loved manga would be much more likely to be an Asian man,” O’Neil says. “You’re picking up gender via correlation and proxy.”

Worse still, O’Neil says, many AI systems perform such calculations within a “black box” that makes it impossible to figure out how results were arrived at. That might not matter if you’re teaching a computer to play chess or process images, but it’s a big deal when you’re making decisions about people’s careers. “Algorithms aren’t just helping automate a process; they’re adding a layer of opacity, so it’s harder to see and undo the failures,” O’Neil says. “With facial recognition it’s easy to see if it works; with hiring software it’s trickier.”

An Unspoken Issue — and a Regulatory Risk

Eliminating that kind of bias is a difficult task, but it’s a vital one if AI systems are to evolve into a trusted piece of the hiring pipeline, says Vineet Vashishta, chief data scientist at Pocket Recruiter. “In the recruitment and HR space, you have to be transparent,” he says. “We have to be able to explain ‘This is the exact way we made the decisions we made,’ and we have to be able to show there isn’t some insidious bias from the data.”

Unfortunately, Vashishta says, few people working on AI recruitment tools are willing to confront their systems’ potential biases. Even at academic and industry gatherings, he says, researchers are reluctant to get drawn into conversations about bias. “It’s three drinks and some heavy sighs later before people start talking about this,” he says. “There are very, very few people talking about this, at least openly, because it’s really scary. When it comes to machine learning, this is one of those areas that could kill the field.”

There’s a real chance that regulators could start to ban machine-learning tools in some jurisdictions, Vashishta says. New York City has created a task force to monitor government use of AI systems, and European regulators are implementing frameworks that will dramatically curb companies’ ability to use individuals’ data without explicit permission. “You might see some governments ban machine learning in the next five years, and it will be because of this issue,” he says. “It will be because it can’t be explained, and that lack of transparency and accountability, and the bias in the machine, scares people.”

But there are real ways to reduce the potential for bias, Vashishta says. Pocket Recruiter uses AI to identify applicants with specific skills rather than simply those who are similar to current employees. Still, it’s ultimately humans who make hiring decisions, and who provide the feedback that Pocket Recruiter uses to improve its algorithms. “Pocket Recruiter can show you candidates you might not have looked at because of unconscious bias,” Vashishta says. “But when it comes to what people tell us they want in a candidate, those unconscious biases creep back in.”

Ultimately, to eliminate bias from AI systems, designers must navigate the treacherous no man’s land between the way things are and the way they ought to be. To strip away real-world biases, designers must either skew their data or massage the way their systems manipulate that data so that the end result reflects a more desirable alternate reality.

But nudging applicant data one way or another is a fraught process. “We do it all the time,” Vashishta says. “But those bell-curve corrections scare us a lot. We don’t know if we’re hurting or helping, and we won’t know until we see the algorithm working.” It’s never easy to make changes that could potentially cost someone a job, he says. “You lose sleep over that sort of thing when you’re in a business like recruiting, because you’re affecting someone’s life. It’s an agonizing way to go.”

No Peeking Behind the Curtain

One lesson for AI recruitment tools could come from an older, lower-tech remedy to bias in hiring: the decision by top orchestras to hang a curtain in front of musicians during auditions. Classical music used to be overwhelmingly male-dominated, but the introduction of blind auditions in the 1970s — to ensure musicians were judged on their playing rather than their appearance — has led to a more than fivefold increase in female musicians in top orchestras.

Getting to that point, though, also required a determination to make sure that the musicians’ gender wasn’t given away by other factors, such as shoes seen through a gap in the curtain or by the clicking of heels on an uncarpeted floor.

Even more importantly, orchestras had to view blind auditions as a starting point rather than a definitive solution, and to accept that in an institution dominated by males for so long, bias would remain a problem. Despite having proven their musicianship, the first women admitted through blind auditions often had to work with conductors who continued to seek to sideline them.

That’s roughly where we are with AI recruitment tools: Like the orchestra’s curtain, machine-learning is a paradigm shift with the potential to wipe away longstanding biases, but isn’t in itself a solution to centuries of prejudice.

In order to make a difference, AI systems will need to be carefully designed and scrutinized to ensure both that they are genuinely bias-free and that they’re deployed in ways that don’t allow human decision-makers to reintroduce bias into the hiring process.

Getting to that point won’t be easy. “This is a problem we’ve spent hundreds of years creating. The long-term solution doesn’t feel like something we’ll get to in five years,” Vashishta says.

In the meantime, HR teams need to stop assuming that computer-based hiring tools are inherently objective. Instead, decision-makers should demand evidence that machine-learning systems have been designed with bias in mind, and are transparent enough that potential problems can be identified and corrected for.

HR teams will have to insist on that level of transparency if AI tools are to evolve into truly unbiased hiring tools, Caruana says. “If enough people ask, it’ll become an important criteria,” Caruana says. “But if nobody asks, we’ll just let the bias keep slipping in.”