Mathematics is fast becoming one of the most important techniques in crime detection. Where once a Sherlock Holmes would have had to be content with a magnifying glass, or a jury with gut instinct and rational discussion, now a range of methods from probability and statistics are available to help. Today, mathematics lies behind expert conclusions on a hundred forensic matters from fingerprints to DNA.
Statistics can be a precious tool when identifying the patterns behind confusing or misleading phenomena. The University of California at Berkeley was sued for gender bias when it was observed that just 35 percent of female applicants to graduate school were being accepted, versus 44 percent of males. The investigators began by narrowing the problem down to six major departments for which, combined, the inequality shifted to an even more incriminating 46 percent of males versus just 30 percent of females. But then, a department-by-department analysis showed the exact contrary of a bias against women: in four of the six departments, they were actually accepted at a higher percentage rate than males, and in the other two, the male-female ratio was 37-34 percent and 28-24 percent, discrepancies too small to have caused the overall appearance of inequality.
This curious problem, known as Simpson's paradox, shows up in all kinds of situations: for example, a recent analysis of national SAT scores showed an improvement in the average scores of every single ethnic group, yet the overall average had not budged by a single point in 20 years. Another example was a particular treatment of kidney stones, whose success rate in a controlled study was higher than that of all other treatments, in spite of doctors' observations to the contrary.
These things sound like paradoxes, yet they all happen. To explain them, we can consider a simplified version of the Berkeley example. Imagine a tiny school with just two departments: an engineering department that receives 10 applications from women and 50 from men, and a humanities department that receives 30 applications from women and 10 from men. In engineering, the school accepts 90 percent of the women and 80 percent of the men, and in humanities, 30 percent of the women and 20 percent of the men; no gender bias is apparent (or if anything, it goes the other way). But what this means in total is that 9+9=18 women are accepted from the 40 female applicants, and 40+2=42 men from the 60 male applicants -- which makes the overall success rate of males 42/60=70 percent, whereas for females it's just 18/40=45 percent! Now it does look like sex bias, yet we saw that it is not. What's going on is that a hidden variable is playing an important role: one has to take not just acceptance rates, but application numbers into account. In the SAT score case above, the hidden variable was the number of members of each ethnic group; the size of certain lower-scoring groups had greatly increased over the years with respect to the higher-scoring groups, preventing the overall average from rising even when the averages of each group did rise. In the kidney stone example, it turned out to be the size of the kidney stones that mattered.
Another area which can involve rather subtle mathematics is DNA identification. For detection purposes, thirteen particular pairs of genes are identified, amongst the many thousand that make up our DNA, and these thirteen pairs are so varied from person to person that the estimated chance of two people (not identical twins) having the same thirteen is just one in 400 trillion, far greater than the population of the world. Thus, when forensic biologists have a good quality sample to work with, they can make an unchallenged identification. But they often have to work with crime scene samples that are very tiny, mixed, or degraded. In these cases, an identification can be made to a given individual only with a certain probability, and it is essential to be able to interpret this probability correctly.
A man was recently tried in San Francisco for a 30-year-old rape and murder, on the grounds that a DNA match was found between a semen sample stored in the cold-case files and an entry in a database of California sex offenders. Furthermore, the crime sample was degraded, so that it would actually match about one person in a million, roughly 300 people in the general population. There was virtually no other evidence against the defendant.
The defense held that with a chance in a million of a match in the general population, running the sample through a database containing about one-third of a million individuals led to a chance of 1 in 3 of finding a random match to an innocent person. As for the prosecution, they cited the one in a million figure, which runs the risk of being misinterpreted as the defendant's chance of being innocent (the "prosecutor's fallacy"). The trouble is that both conclusions are wrong. The defense argument ignores two essential facts: firstly, that the 300 matching individuals are evenly distributed in age and geography around the country, not concentrated in a database of California sex offenders, and secondly the non-negligible probability that the original murderer may actually have been in the database for other offenses. For the prosecution, when using the one in a million figure, they must specify that the DNA alone only narrows the pool of potential murderers down to about 300 individuals, and must then use the facts that the unique database match turned out to be to a man who shared several characteristics with the original murderer, namely age, race (according to an eyewitness statement), location, and being a sex offender (whether registered or not), to narrow the field. Using these factors, the probability of the defendant's innocence can be assessed as being less than about one in seventy.
The fate of defendants can hinge on such calculations being made rigorously. It is essential to examine the errors that are most frequently made, learn to avoid them, and to establish controlled mathematical procedures that will be valid in a court of law.
Leila Schneps and Coralie Colmez are the authors of Math on Trial: How Numbers Get Used and Abused in the Courtroom, available now from Basic Books.