I heard someone get upset because researchers were using statistics to study differences between racial and cultural groups, sexes, and economic classes. And stated that way it sounds worse than it actually is—this was their interpretation of what the researchers were doing. It’s not that the researchers were looking for differences between subgroups, but that subgroups emerged when differences were found. So really, properly phrased, the researchers were using statistics to study social issues and found differences between subgroups of a population.
The comments were made with reference to the book Freakonomics, which raises questions which deserve to be debated, but they underlie a deeper distrust with numbers and statistics. Yes, it’s true that statistics can be used to “lie”, but that does not imply that all of what comes out of statistics is a lie. Statistics is a tool, and the only way to defend yourself against an attack involving statistics is to educate yourself about statistical methodology. Or take it for what it’s worth, evidence of a possible explanation to a question, and run with the idea of what is being argued. Think critically, but don’t disregard ideas just because they use statistics.
As one friend pointed out, this “discrimination” between subgroups is needed to account for differences that are not the result of bias or preconditioned ideas on the part of the person doing the statistical analysis. Otherwise we wouldn’t be able to identify discrimination based on race, sex, or economic class. And there’s the irony. Another point my friend raised was something known as Simpson’s Paradox, wherein an effect can be reversed when considered at the aggregate level compared to the subgroup level. A great example is sex bias in graduate admissions to Berkeley in 1973—admission rates at the aggregate level showed bias against women, but at the departmental level the bias was actually (slightly) against men. It turns out women were applying to more competitive departments that didn’t have the same degree of “prior screening” (e.g., they were applying to English and not Engineering, which requires math).
I also spoke to a couple of people that studied in the social sciences. An economics grad noted that it was dangerous to not break data into subgroups, as demonstrated by Simpson’s paradox, and a psychologist grad said they were trained in their program that it was unethical to exclude subgroups from social science research. Regarding the latter point, imagine norms established using caucasians then applied to First Nations groups. We are not all the same. And although we are all unique individuals, we correlate with subgroups. There’s nothing inherently wrong with that observation.
I think the important point to note is that individuals may discriminate, but numbers don’t know race or sex. This is what makes statistical analysis so appealing, as we can get around bias and preconceptions to evaluate data. But it is also why we have to be careful when doing statistical analysis, not to inject such errors into our results. Ultimately, however, I think the comment that studying subgroups is discriminatory was born out of the most innocent form of ignorance. A basic understanding of statistics is needed by everyone given the importance numbers play in our lives, or at the very least tolerance of something that is not well understood.