It’s a little disconcerting to see a putative scientist write an ode to the virtues of trying not to discover things, but here we are.
Cathy O’Neil had a job to do. The job was to take other people’s money, and allocate it to provide the greatest reduction in homelessness per dollar. In order to figure out what the best use of that money was, she was supposed to figure out what the payoff was for different sorts of interventions on different sorts of people.
But O’Neil decided to do a different job. She took her pre-existing ideas about the relationship between homelessness and race (implicitly, although she never says this outright, that services to address homelessness are less effective on blacks), and on that basis decided to make her model less accurate, in order to obscure that potential effect.
Why is it important to obscure that effect? Well, you see:
“since an algorithm cannot see the difference between patterns that are based on injustice and patterns that are based on traffic, choosing race as a characteristic in our model would have been unethical.”
Which doesn’t make a whole lot of sense, given that “patterns based on injustice” is an incoherent category, and the ethical implications of ignoring these patterns are fuzzy at best. Is the relationship between homelessness and income “based on injustice”? Gender? Age? Education? Can we use these as variables? If the existence of homelessness in the first place is in some sense “unjust”, does one get to try to solve the problem at all? If it turns out homeless services are actually more effective on blacks, does the ethical polarity switch, or do ethics require we not correct the “injustice” that produced the data?
What does she think the data was collected for? Approximately every form I have ever filled out for the government has required race to be entered. Our elected representatives and unelected civil servants have decided it is very important to track the racial composition of people attending college, receiving home loans, purchasing firearms, registering cars, and renting mailboxes. Intentionally filling out the form incorrectly is usually a felony. Does she imagine that that data isn’t intended to be used? Or only used when you think the result will conform to your subjective preferences?
And finally, what is her model of ethics in the first place? We construct, for instance, fiduciary responsibility as an ethical obligation of a corporate officer, because otherwise we wouldn’t be able to trust them with our money. These things exist for the benefit of the profession and its customers jointly, so we don’t have to haggle over certain core obligations on a case-by-case basis - if I buy stock in a company, even a company like Facebook where one person dominates the voting rights, I know they are not supposed to benefit themselves at the expense of swindling minority shareholders, which would prevent a share of stock from having value in the first place. What is the construction by which scientists have an obligation to not make correct inferences if the “patterns … are based on injustice”?
Needless to say, her decision making process as she avoids thinking too hard about these questions, or discovering anything she thinks she shouldn’t, is both the opposite of “science”, and the opposite of serious inquiry into one’s ethical obligations.
If O’Neil had wanted to produce a “race-neutral” program, without the sticky issue of more-or-less perfect correlates of race (like zip code in her segregated metropolis - a variable she doesn’t get around to telling us if she found “ethical” to use in the end), the appropriate thing to do is to control for race by including it in your model, and then make the policy decision to neutralize the effect of that variable. There are several ways to do this. In some model structures, one can manually set the effect of a variable to zero after the fact (which has the secondary effect of allowing, eg, zip code to account for the effect of geography qua geography, rather than of geography plus a proxy for race). Or, one can apply a differential to keep per-group effects at the same level, while accounting for different patterns of within-group variation. Maybe you even run the numbers with and without the variable, and decide that in the end avoiding the offense to your sensibilities is worth some extra person-years of homelessness.
But O’Neil doesn’t want a neutral program, and she certainly doesn’t want to take responsibility for an “ethical policy” of ignoring a known effect of known size.
She simply prefers not to know.