Saturday, March 10, 2012

Racist statistics

In the western world outright racism and sexism (which I will group in this post under racism)  seems to be (mostly) eradicated; even though we should always be vigilant that it may return, sooner than we think. However, the fight against racism has not been won. In fact, the most difficult part of the fight against this despicable human practice has began: the fight against hidden racism. It is in this fight that statistics play an ambivalent role.

So what is hidden racism? For me, it is racism that is not bound by formal rules and strict practices, but something that is more under the surface. An example: suppose that every time you had to make a choice between a white guy and a black guy, and that all things being equal you choose the white guy 75% of the time. This is obviously a racist process, since you're making a clear difference between the white and black guy. It is however very hard to prove that you are being racist since every case (white or black guy) is not necessarily obviously racist (you can still choose the black guy). That is the problem with hidden racism.

The use of statistics can help identify these cases. In general we know that women are still being discriminated when it comes to jobs in our societies. All things being equal women often earn 10% less than their male coworkers. But this is hard to judge on a case by case basis, since the pay scale is often not constant, because many jobs pay more depending on experience; extra tasks and other factors. However if you have a larger group of cases then statistical analyses can show a severe distortion in the wage levels. Large companies are vulnerable to this use of statistics, reference the women vs Walmart case. If we can eradicate sexism from large companies we have already won a great battle in the fight against discrimination. I believe that smaller companies will eventually follow the big ones and will discriminate less as well.

Unfortunately, statistics can also be used to hide racism, or even worse: they can create racism. A good example of this is the statistical algorithms that banks use to decide whom they lend to. These algorithms use a variety of factors including zip-code. Since zip-code was apparently a good predictor of loan delinquency, entire zip-codes were excluded, 'red-lined',  from obtaining mortgages and other credit. The main groups that were living in these zip-codes (which were in the poorer neighborhoods in the larger American cities) where black and latino communities. This is a very racist policy because it is not easy to change your living space to a better and more expensive neighborhood, therefore these people were already discriminated against from birth. The redlining example shows that a process that did not start out as racist (I hope) became racist thanks to the use of statistical algorithms.

This leads to a problem: are statistics inherently good, bad or neutral? As with many technologies this is all related to the actual use of the statistics. It is clear that statistics in the first example provide a weapon against widespread sexism and are therefore very good. The second example is more ambiguous: even though the objective was not racist, the consequences were. I see this as a warning: we should always be vigilant that racism does not creep up to us, even when we operate with the best intentions.       

No comments:

Post a Comment