When I was in graduate school, we debated the merits of a centralized data store that policy makers could use to make better decisions; ultimately, we decided the risks to privacy outweighed the benefits.
Data collected by (ethical) businesses is de-identified, typically by assigning each case with an arbitrary number. Government data isn’t, although as the article below points out, it could be. The bigger concern is that, unlike private organizations, the government can detain, arrest, and even execute people. On the one hand, none of these things happen without due process; on the other, power corrupts, and–what is far more worrying–people make mistakes. Are we willing to accept that?
We might be, if it actually does lead to better policy: having worked for the government, I can tell you that we routinely made decisions on what I’ll call sparse information. Several times, I had to request data from another state agency, and each time we had to draft an agreement specifying precisely what my agency could do with it. And there’s no data standardization across agencies, so sometimes after going through all this, I wasn’t able to merge the two data sets.
Which raises another issue: to make this work, each agency would have to use the same data semantics, file structure, and database application. Even in the ideal case, where everyone can agree on a common data dictionary, each agency’s ability to contribute data will be limited by its own architecture. And, as the second link makes clear, things are usually not ideal.