Stephen J. Ceci and Wendy M. Williams
Part 1 of this blog described surprising new findings from our research showing faculty prefer women over identically-qualified men for an assistant professorship (link). This challenges the common belief that women face bias during hiring. Part 2 addresses criticisms of our study.
Experimental procedure differed from actual academic hiring
In four (of five) experiments we asked faculty to evaluate three hypothetical finalists for a starting professorship in their department. They were told their colleagues read these candidates' CVs and recommendation letters and attended their talks, ultimately rating two finalists equally strong and the third slightly weaker. Faculty were given their colleagues' narrative evaluations of these finalists and 10-point scale ratings. Some said our experiments did not resemble hiring decisions because faculty were not given CVs and did not meet finalists and attend their talks (link). Such criticism misses the point of the experiments.
Our goal was to conduct a controlled experiment to reveal attitudes about gender. This task would not be possible using real applicants giving real talks, since so many variables would be confounded. When past research using hypothetical candidates revealed anti-female bias in hiring of assistant professors in the female-dominated field of psychology, or in hiring of staff lab managers, the criticism that faculty did not have meetings with the applicants was never raised. Perhaps this was because these former studies presented conclusions critics found more personally appealing?
Some claim faculty in real-world hiring situations may not prefer women and in fact may show a bias against them. However, data on who is actually hired for U.S. STEM assistant professorships shows clearly that women are hired at a higher rate than men. We are not referring to experiments--we are referring to real hires of real people; women are preferred in professorial hiring. Our article described these real-world data documenting women are usually hired more often than men.
The purpose of our experiments was to determine whether women's advantage results from women being stronger than men, as many argue. To test this hypothesis, the man and woman needed to be identically qualified. We explained why CVs are inappropriate when examining diverse fields and institutions: Fields and institutions differ in what they consider excellent in terms of numbers and types of publications. A CV viewed as excellent at small teaching-intensive colleges may not be viewed as excellent at large doctoral-intensive institutions. Subfields within fields also differ in how a CV is viewed: e.g., "proceedings" publications may be viewed differently across subfields; an excellent CV in developmental psychology will not have enough publications to be competitive in social psychology (link).
Customizing dozens of CVs for every size and type of institution -- and all fields and subfields --introduces marked noncomparability, so we created a single narrative summary not listing specific numbers or types of publications. Instead, it summarized colleagues' evaluations of finalists, based on CVs and job talks: "Based on her vita, letters of recommendation, and their own reading of her work, the search committee rated her research record as extremely strong." These summaries allowed the 838 faculty (in experiments 1-3 and 5) to substitute the number and type of publications an extremely strong finalist needed in their own department/institution/subfield. In Experiment 4, we gave 35 mechanical engineering faculty real CVs, in order to compare their rankings to those made by mechanical engineers who used summaries, and these CV-equipped faculty strongly preferred the woman's CV over the identical CV with a man's name. Thus, we did not try to mimic actual interviews to see if women were preferred because a preference for women in actual interviews was documented long before we designed our experiments. Our goal was to learn if this advantage was due to women being stronger, and it turned out the advantage was due simply to being female.
Finalists were too qualified
Some argued our finalists were unrealistically strong (link). Finalists were rated between 9.3-9.5 on a 10-point scale (in which 9 is extremely impressive and 10 is truly exceptional). Some believe bias against women occurs when women are "ambiguously competent," not when they are unambiguously excellent. But ambiguous competence is not found in finalists for tenure-track professorships. No one hires scholars for the precious few tenure-track jobs unless they are stellar, especially in today's buyer's market. Often 50 to 300 applicants apply for a tenure-track post (our recent search generated 267 applicants, and several faculty in our study reported similar numbers). All tenure-track applicants successfully completed doctoral programs, earned publications, and garnered strong letters of recommendation. When hundreds of such applicants are whittled down to the top three finalists, they are without question truly exceptional. In fact, the top 30 to 40 applicants are usually all extraordinary.
Evidence from our final two experiments further undermines the criticism that finalists were too qualified: 1) the CVs we used in Experiment 4 were real CVs--and there is nothing ambiguously competent about them; and 2) in Experiment 5, faculty rated applicants' strength using the same summaries used in Experiments 1-3. They did not rate them 9.3-9.5, but rather 7.14 for males and 8.20 for females -- still excellent ratings, but there was a lot of room to rate these candidates even higher if faculty wished. Thus, the preference for women in our study was not due to finalists being unrealistically excellent. Indeed, depicting lesser competence would have been unrealistic.
Faculty guessed the study's aim
Commentators claimed faculty were aware we were studying sexist hiring and that they chose women to appear politically correct. Four sources of evidence refute this claim. First, 30 faculty were asked to guess what the study was about. None guessed correctly. Second, in Experiment 5, faculty were given only a single applicant to rate, male or female. They had no knowledge a mirror applicant was sent to other faculty with the gender changed. Thus, faculty rating the male applicant could not have downgraded him to 7.14 under an assumption that some unknown faculty member elsewhere would upgrade a female applicant to 8.20. Third, if faculty were aware that our purpose was to determine if they are biased, they should have given the same rank to the identically-qualified man and woman (i.e., tie them for first place). Only a handful of faculty chose this option. Fourth, if respondents knew the purpose of the study, why in some conditions was there no female preference? It seems implausible to argue that faculty knew the hypothesis, but only acted upon it occasionally.