Unconscious Bias in Hiring: What the Research Shows

In short

Unconscious bias in hiring means evaluators favour or disfavour candidates based on group cues like a name, even when they believe they are being objective. Field experiments show this is real and has not faded over time. Structured processes such as standard questions, scorecards, and work samples reduce bias better than one-off awareness training, which raises awareness but has very limited evidence of changing hiring behaviour.

Unconscious bias in hiring is when an evaluator's automatic associations about a social group shift their judgment of a candidate, without the evaluator intending it or noticing it. It shows up as a gap in how identical candidates are treated when the only difference is a name, an age, or a parental status. The strongest evidence comes from field experiments that send out fictitious but equivalent applications and measure who gets called back.

This post sets out what the research shows, including where the evidence is strong and where it is mixed. Bias is well-documented. The popular fix, a one-off unconscious bias training session, has weak evidence behind it. The levers that hold up better are changes to the hiring process itself.

Key takeaways

Identical resumes get different results based on the name on them. Bertrand and Mullainathan (2004) found white-sounding names got 50% more callbacks than Black-sounding names.
Discrimination has not faded. A meta-analysis by Quillian and colleagues (2017) found no decline in hiring discrimination against Black applicants over 25 years.
Unconscious bias training raises awareness but has very limited evidence of changing behaviour. The EHRC review (Atewologun, Cornish and Tresh, 2018) warned against relying on one-off training.
The IAT correlates only weakly with real behaviour. Treat IAT scores with caution (Oswald and colleagues, 2013).
Changing the process is a better-tested lever than changing minds. Structured interviews, scorecards, and work samples are more reliable.
Blind hiring is not a guaranteed fix. The Australian BETA trial (2017) found anonymising applications did not improve diversity outcomes and depends on context.

Identical resumes, unequal callbacks. In Bertrand and Mullainathan's field experiment, the only difference between two otherwise identical applications was the applicant's name. Resumes with white-sounding names drew a 9.65% callback rate versus 6.45% for Black-sounding names, about 50% more callbacks for the same qualifications.

Why it matters

Recruiters and employers make high-stakes decisions about people based on thin information, often a CV scanned in seconds. If a name or an age quietly shifts those decisions, qualified candidates get screened out for reasons that have nothing to do with the job. That is unfair, and it wastes talent your client is paying you to find.

It also matters because money and goodwill go into interventions that may not work. If you spend your budget on awareness training that does not change behaviour, you have not reduced bias. Knowing what the evidence supports helps you spend effort where it counts: on the structure of how you screen, interview, and score people.

What the studies show

What unconscious bias means

Unconscious bias, also called implicit bias, is an automatic and unintentional mental association about a social group that can shape decisions without conscious awareness. In hiring it means favouring or disfavouring candidates based on group cues such as a name signalling race or gender, even when the evaluator believes they are being objective. The Implicit Association Test (IAT) is the common lab measure, but its real-world predictive power is weak (Oswald and colleagues, 2013).

Bertrand and Mullainathan (2004): the identical-resume callback gap

Researchers replied to help-wanted ads in Boston and Chicago with about 5,000 fictitious resumes, randomly assigning white-sounding names (Emily, Greg) or Black-sounding names (Lakisha, Jamal). Resumes with white-sounding names received 50% more callbacks, roughly 9.65% versus 6.45%, or about 1 in 10 versus 1 in 15. The gap was equal to about eight extra years of experience. A stronger resume helped white names far more than Black names.

Goldin and Rouse (2000): the orchestra blind-audition study

When major US orchestras moved to blind auditions, with musicians playing behind a screen, the share of women hired rose. The authors attributed roughly 30% or more of the increase in new female hires to the screen. This study changed the process rather than the evaluators. Later re-analyses have questioned the precision and magnitude of the estimates given small samples, so cite the numbers with caution.

Moss-Racusin and colleagues (2012): the faculty hiring experiment

Science faculty (n=127) at research-intensive universities rated a lab-manager application that was identical except for the name, randomly John or Jennifer. They rated the male applicant as more competent and hireable, were more willing to mentor him, and selected a higher starting salary (about $30,238 versus $26,508). Both male and female faculty showed the bias. Good intentions did not remove it.

Quillian and colleagues (2017): no decline over 25 years

A meta-analysis of every available US field experiment on hiring discrimination from 1989 to 2015 (24 studies, more than 54,000 applications) found no change in discrimination against Black applicants over the period. On average, white applicants got about 36% more callbacks than equally qualified Black applicants and about 24% more than Latino applicants. Discrimination against Latinos may have declined modestly. Against Black applicants it was flat.

The limits of unconscious bias training

A UK EHRC review of 18 studies (Atewologun, Cornish and Tresh, 2018) found training can raise awareness and reduce IAT scores, but there is very limited or no evidence it changes actual workplace behaviour, and some training can backfire. A meta-analysis of 492 studies (Forscher and colleagues, 2019) found changes to implicit measures are weak, often fade, and do not reliably translate into behaviour change.

Bias is not only race and gender

Correll, Benard and Paik (2007) documented a motherhood penalty: mothers were rated less competent and committed, recommended for lower salaries, and called back about half as often as identical childless women, while fathers were not penalised. Neumark, Burn and Button (2019) found age discrimination using over 40,000 applications, hitting older women hardest. Ameri and colleagues (2018) found applicants disclosing a disability got about 26% fewer employer responses.

What actually reduces bias

Step 1: Use structured interviews

Ask every candidate the same predetermined, job-related questions in the same order, and score answers against a defined rubric. Schmidt and Hunter (1998) reported structured interviews predict job performance better than unstructured ones (validity about 0.51 versus 0.38). A 2022 re-analysis by Sackett and colleagues argued some older figures were overstated, but structured interviews still ranked among the stronger, more reliable predictors.

Step 2: Score against a written scorecard

Define the competencies you will assess and the rating scale before you see candidates, then grade everyone on the same explicit criteria. A standardised scorecard reduces the room for irrelevant or biasing information to creep in. Each candidate is rated on the same defined competencies rather than a general impression.

Step 3: Add work-sample tests

Have candidates perform tasks that simulate the actual job, such as a short coding exercise or writing task. Work samples were among the most predictive methods in Schmidt and Hunter (1998). They shift attention from resume cues like a name toward job-relevant performance you can observe directly.

Step 4: Compare candidates side by side

Bohnet, van Geen and Bazerman (2016) found that evaluating candidates jointly and comparatively, rather than one at a time, reduced the pull of gender stereotypes. Joint evaluation focuses attention on job-relevant performance data instead of group-based shortcuts. Changing the decision process is a more reliable lever than trying to change individual minds.

Step 5: Consider anonymising early screening, with eyes open

Removing name, gender, and similar cues at the sifting stage targets name-based bias directly, which is why the UK adopted name-blind recruitment commitments in 2015. It is not guaranteed to help. The Australian BETA trial (2017), with about 2,100 assessors and over 2,000 applications, found de-identifying applications did not improve, and in some analyses slightly reduced, shortlisting of women and minority candidates. Treat blind hiring as context-dependent and measure your own results.

Step 6: Use diverse panels as a supporting measure

A panel with evaluators from different backgrounds is sometimes used to dilute any single evaluator's biases. Treat this as a supporting structural step rather than a proven fix, since the evidence for its specific effect is suggestive rather than strong. Pair it with the structured methods above, which have firmer support.

Do this

Send and score candidates against the same predetermined questions and written criteria every time.
Cite studies with their authors and year so claims can be checked.
Treat field-experiment callback gaps as the strongest evidence of bias.
Measure your own hiring outcomes before and after any change, rather than assuming it worked.
Treat blind hiring as context-dependent and test it in your own setting.
Read IAT scores with caution, since they correlate only weakly with real behaviour.
Spend budget on process changes that are better tested than awareness training.
Watch for bias beyond race and gender, including age, parental status, and disability.

Common mistakes to avoid

Treating one-off bias training as the fix

The EHRC review (Atewologun, Cornish and Tresh, 2018) found training can raise awareness but has very limited or no evidence of changing actual behaviour, and can backfire. Do not assume a single session reduces discrimination. Pair any training with structural changes that are better supported.

Assuming blind hiring always helps

The Australian BETA trial (2017) found anonymising applications did not increase, and in some analyses slightly reduced, shortlisting of women and minority candidates in that setting. Blind hiring targets name-based bias, but its effect depends on context. Test it and measure outcomes rather than rolling it out as a guaranteed win.

Using IAT scores to judge individuals

The IAT has modest test-retest reliability and, per Oswald and colleagues (2013), correlates only weakly with actual discriminatory behaviour. It is a poor basis for claiming a specific person is biased or for proving an intervention worked. Do not lean on IAT scores as your main evidence.

Believing discrimination has already faded

Quillian and colleagues (2017) found no decline in hiring discrimination against Black applicants over 25 years. The UK GEMM study (Heath and Di Stasio, 2019) found ethnic-minority applicants needed about 60% more applications to get the same callbacks. Assuming the problem is solved leads to dropping the very processes that help.

Relying on good intentions

In Moss-Racusin and colleagues (2012), both male and female faculty rated the male applicant higher despite believing they were objective. Bias does not require ill intent. A sincere commitment to fairness does not, on its own, remove the gap. Structure does more than goodwill.

Overstating the famous numbers

Some headline figures, like the orchestra blind-audition estimate (Goldin and Rouse, 2000), have been questioned in later re-analyses given small samples. Quote strong findings, but flag the caveats. Overclaiming undermines the credible core of the evidence.

Frequently asked questions

What is unconscious bias in hiring?

Unconscious bias in hiring is when automatic, unintentional associations about a social group shift how an evaluator judges a candidate, without the evaluator being aware of it. It shows up as different treatment of otherwise-identical candidates based on cues like a name, age, or parental status. Bertrand and Mullainathan (2004) demonstrated it by finding white-sounding names got 50% more callbacks than identical resumes with Black-sounding names.

Does unconscious bias training work?

Unconscious bias training can raise awareness and lower IAT scores, but the evidence that it changes actual hiring behaviour is very limited. The UK EHRC review (Atewologun, Cornish and Tresh, 2018) cautioned against using one-off training to change behaviour, and a 492-study meta-analysis (Forscher and colleagues, 2019) found changes to implicit measures are weak and often do not translate into behaviour change. Some training can even backfire.

What actually reduces bias in hiring?

Changing the process reduces bias more reliably than trying to change minds. Structured interviews, written scorecards, and work-sample tests have stronger evidence (Schmidt and Hunter, 1998; Sackett and colleagues, 2022). Comparing candidates side by side rather than one at a time also reduced gender stereotyping (Bohnet, van Geen and Bazerman, 2016).

Does blind or anonymised hiring fix bias?

Blind hiring does not fix bias reliably. It targets name-based bias and prompted policies like UK name-blind recruitment commitments in 2015. But the Australian BETA trial (2017) found anonymising applications did not improve, and in some analyses slightly reduced, shortlisting of women and minority candidates. Its effect depends on context, so test it and measure your own outcomes.

Has hiring discrimination declined over time?

Hiring discrimination has not declined against Black applicants in the US. A meta-analysis by Quillian and colleagues (2017) of field experiments from 1989 to 2015 found no change, with white applicants receiving about 36% more callbacks than equally qualified Black applicants. Discrimination against Latinos may have declined modestly. A UK study (Heath and Di Stasio, 2019) found minority applicants needed about 60% more applications to get the same callbacks.

Is the Implicit Association Test reliable?

Use the IAT with caution. It has modest test-retest reliability, often around r=0.5 or lower, and a meta-analysis (Oswald and colleagues, 2013) found its scores correlate only weakly with actual discriminatory behaviour. That makes it a poor basis for predicting an individual's real-world bias or for judging whether an intervention worked.

The bottom line

The research gives a clear, balanced picture. Unconscious bias in hiring is real and well-documented in field experiments, from the identical-resume callback gap (Bertrand and Mullainathan, 2004) to the motherhood penalty (Correll, Benard and Paik, 2007), and it has not faded over time (Quillian and colleagues, 2017). The popular fix is the weakest part: one-off awareness training raises awareness but has very limited evidence of changing behaviour, and blind hiring depends on context rather than being a guaranteed win. The stronger evidence sits with changing the process: structured interviews, scorecards, work samples, and side-by-side evaluation.

The core work is the hiring process itself: same questions, same criteria, scored the same way, with your own outcomes measured over time. Consistent CV structure and an option to anonymise can support those choices, which is one reason tools like RefineCV exist, but the process is what reduces bias.

Consistent CVs, with an anonymise option

Structure is what reduces bias. RefineCV formats candidate CVs into one consistent template, with the option to anonymise. Try it free with 10 CVs, no credit card.

Start Free, 10 CVs

Sources

Bertrand & Mullainathan, "Are Emily and Greg More Employable Than Lakisha and Jamal?", American Economic Review 94(4) (2004-09): Resumes with white-sounding names received 50% more callbacks than identical resumes with Black-sounding names (about 9.65% vs 6.45%); the gap equalled about eight additional years of experience.
Goldin & Rouse, "Orchestrating Impartiality", American Economic Review 90(4) (2000-09): Blind auditions behind a screen helped explain the rise in women hired by major US orchestras; the authors attributed roughly 30% or more of the increase to the screen, though later re-analyses have questioned the precision given small samples.
Moss-Racusin et al., "Science faculty's subtle gender biases favor male students", PNAS 109(41) (2012-09-17): Science faculty (n=127) rated an identical lab-manager application as more competent and hireable, and selected a higher starting salary (about $30,238 vs $26,508), when the name was male rather than female; both male and female faculty showed the bias.
Quillian, Pager, Hexel & Midtboen, "Meta-analysis of field experiments shows no change in racial discrimination in hiring over time", PNAS 114(41) (2017-09-12): A meta-analysis of US field experiments from 1989 to 2015 (24 studies, more than 54,000 applications) found no decline in discrimination against Black applicants; white applicants received about 36% more callbacks than equally qualified Black applicants and about 24% more than Latino applicants.
Atewologun, Cornish & Tresh, "Unconscious bias training: An assessment of the evidence for effectiveness", EHRC Research Report 113 (2018-03): A UK EHRC review of 18 studies found unconscious bias training can raise awareness and reduce IAT scores but has very limited or no evidence of changing actual behaviour, and can backfire; it cautioned against one-off training as a way to change behaviour.
Forscher et al., "A meta-analysis of procedures to change implicit measures", Journal of Personality and Social Psychology 117(3) (2019-06): A meta-analysis of 492 studies found interventions can change implicit measures, but these changes are typically weak, often do not persist, and do not reliably translate into changes in behaviour.
Behavioural Economics Team of the Australian Government (BETA), "Going blind to see more clearly" (2017-06): An Australian randomised trial (roughly 2,100 assessors, over 2,000 applications) found de-identifying applications did not increase, and in some analyses slightly reduced, shortlisting of women and minority candidates; BETA concluded blind recruitment should not be assumed to improve diversity outcomes.
Schmidt & Hunter, "The Validity and Utility of Selection Methods in Personnel Psychology", Psychological Bulletin 124(2) (1998-09): Structured employment interviews showed higher predictive validity for job performance than unstructured ones (about 0.51 vs 0.38), and work-sample tests were among the most predictive methods.
Sackett, Zhang, Berry & Lievens, "Revisiting meta-analytic estimates of validity in personnel selection", Journal of Applied Psychology 107(11) (2022-11): A 2022 re-analysis argued earlier range-restriction corrections overstated some selection-method validities, but structured interviews and other structured assessments remained among the most valid predictors of job performance.
Correll, Benard & Paik, "Getting a Job: Is There a Motherhood Penalty?", American Journal of Sociology 112(5) (2007-03): Mothers were rated less competent and committed and recommended for lower salaries than identical non-mothers; a paired audit found mothers were called back about half as often as equally qualified childless women, while fathers were not penalised.
UK Government (Cabinet Office), name-blind recruitment announcement (2015-10-26): The UK Government adopted name-blind recruitment commitments in 2015, under which the Civil Service and major employers agreed to remove names from applications at the sifting stage.
Heath & Di Stasio (Centre for Social Investigation, Nuffield College, Oxford), GEMM correspondence study (2019-01): A UK correspondence study found ethnic-minority applicants needed to send around 60% more applications than white British applicants to receive the same number of callbacks.
Neumark, Burn & Button, "Is It Harder for Older Workers to Find Jobs?", Journal of Political Economy 127(2) / NBER WP 21669 (2019-04): A field experiment sending more than 40,000 fictitious applications found robust age discrimination in hiring, with older applicants, especially older women, receiving fewer callbacks.
Ameri et al., "The Disability Employment Puzzle: A Field Experiment on Employer Hiring Behavior", ILR Review 71(2) / NBER WP 21560 (2018-03): Applicants who disclosed a disability received about 26% fewer expressions of employer interest than otherwise-identical applicants without a disability.
Oswald et al., "Predicting Ethnic and Racial Discrimination: A Meta-Analysis of IAT Criterion Studies", Journal of Personality and Social Psychology 105(2) (2013-08): The IAT has modest test-retest reliability (often around r=0.5 or lower) and its scores correlate only weakly with actual discriminatory behaviour, making it a poor basis for predicting individual real-world bias.
Bohnet, van Geen & Bazerman, "When Performance Trumps Gender Bias: Joint vs. Separate Evaluation", Management Science 62(5) (2016-05): Evaluating candidates jointly and comparatively rather than one at a time reduced the influence of gender stereotypes by focusing attention on job-relevant performance data.