The growth in the amount of data we can store and access for scientific analysis is creating new opportunities for discovery. It is also creating opportunities for the development of new statistical methods and techniques. And, as professes Mike Anderson at Wired Magazine, it will make the scientific method obsolete.
Anderson starts with a quote from statistician George Box to focus on the idea that “all models are wrong”. Prior to the deluge of data, we were limited to the idea that “some are useful”. But now we can take the example set forth by Google and mine vast amounts of data to look for patterns in science. “Correlation supersedes causation”, states Anderson in his concluding remarks—science can move forward without theory or models.
He proposes we consider a revised quote, put forth by a research director at Google: “All models are wrong, and increasingly you can succeed without them.” This quote, however, is about success in business, not science. It’s the difference between engineering and science. Models in science represent our understanding of a process or system. They’re not just there to get us an answer. The goal of science is to understand; the goal of engineering is to solve a problem.
The devil is in the details
Then there are the technical arguments. The algorithms that Anderson describes in pushing forward his ideas have constraints in the underlying statistical models that come from scientific theory. And data mining is no panacea for all research problems—the relevant probability theory requires constraints that cannot be overcome by blind faith in all things Bayesian.
Even if you accept that data mining algorithms require some constraints based on assumptions of some kind, I’m not convinced that they could achieve the level of accuracy required to kill the scientific method. In the domain of military intelligence, of which I’ve had some exposure over the last couple of years, a good model can pick out a needle in a haystack. And a good model depends on accurate theory.
Although Anderson loves to highlight the success of Google, the truth is search is far from perfect. The “semantic web” is touted as being the next big thing to advance the science of search, among other things, but is based on theory as well as data. Intelligent search is not about crunching more data, it’s about understanding information and reasoning.
Statistical algorithms that are being used to “find patterns where science cannot” are actually part of the scientific method. They are tools to help advance science so we develop a better understanding of our world. They will be used to develop and refine theory. And at least one thing is certain: Anderson has succeeded in creating a lot of buzz by putting forward a controversial idea.