Big Data’s File Drawer Problem

In social science, the File Drawer Problem refers to the fact that negative results are very rarely published. The problem is named for the hypothetical “file drawer” that every researcher has, where she keeps the failed experiments and research that never panned out. Having worked in the academic world, I can confirm these file drawers, both literal and figurative, do exist.

The reason this is problematic is that, for an article to be published, it has to show its results are statistically significant. This is because statistical models can’t be proven right: they can only be proven to be not wrong with a given probability. Typically, the threshold to get published is 95% significance, which gives a 1 in 20 chance the model appears significant, but in fact isn’t.

The File Drawer Problem comes in because this threshold assumes that researchers are equally likely to publish positive results that meet the threshold, and negative results that do not. However, a researcher who conducts 20 experiments will, on average (and all else equal), find one is significant just from chance. If she only sends that one article off for publication, and puts the other 19 in her file drawer, then her findings aren’t actually valid: other researchers can’t rely on them. And if she’s not the only one (and there’s good evidence that she isn’t), then a sizeable part of the accepted knowledge in her discipline simply isn’t true.

Several methods have been proposed to mitigate this problem, from requiring a higher level of significance, such as 99%, to doing away with significance levels altogether and using other forms of validation. However, no one has proposed updating a model’s findings after it has been accepted for publication. That’s where Big Data comes in.

In her recent book, Weapons of Math Destruction, Cathy O’Neil argues that machine learning models can become destructive when they don’t follow up their conclusions to see if they were, in fact, correct. Admittedly, this can be a challenge: a model that reads resumes to select the best candidates can’t follow the people who weren’t selected and evaluate their performance at their next job. Often, however, people just don’t take the time. The result, O’Neil argues, is these models create their own self-reinforcing version of the truth. In the terms I’ve used here, Big Data also has a File Drawer Problem.

One solution O’Neil offers is to build this type of validation into the model’s design. She notes that models like the Fair-Isaac credit score (FICO) are continually updated with new information. If the FICO model predicts someone is a good credit risk, and they end up defaulting, it can update its algorithm. This isn’t perfect: someone rejected as a poor credit risk likely won’t have the chance to disprove the model, because no one will lend to him. Still, updating the model where possible helps keep its predictions in line with what actually happens: it mitigates, to some extent, the File Drawer Problem.

This modeling framework could also work in social science. In theory, researchers are supposed to test each other’s models with new data to see how well they perform. In practice, this rarely happens, both because this type of research is difficult to publish, and because it can potentially make someone powerful enemies. One alternative is for the researchers themselves to post a brief follow-up study comparing the model’s predictions to what actually happened. This would show the model’s strengths and weaknesses, and could also inform future research (though care would be needed here to avoid building a model solely to predict the observed data).

Will this happen? Probably not: “publish or perish” is more true than ever, and no one is going to risk tenure by publicizing flaws in their own work. However, something like this needs to be done if social science is going to compete in a marketplace of ideas that includes Big Data. Although the two have very difficult cultures, they share the same goal of making sense of the world. And it goes without saying that sense is something today’s world dearly needs.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *