{"title":"If You Torture the Data Long Enough","authors":"Gary Smith","doi":"10.1093/oso/9780198824305.003.0008","DOIUrl":null,"url":null,"abstract":"I recently received an e-mail that offered me a way to automate my research: Dear Professor Smith, We would like to introduce you to [our] brand new research tool . . . , ready to automate your empirical research basing on official statistical time series databases. [Our software] has been designed to explore and discover new exciting economic correlations directly from your desktop. No extra software required, no need to crawl thousands of databases manually. You’ll be up and running in no time your first big data project. The e-mail went on to boast that their software will calculate “correlation coefficients with millions of statistical time series,” “identify unexpected interdependences,” and “find new insights.” The creative grammar was one thing. More disheartening was their assumption that I wanted to sift through literally trillions of correlations looking for unexpected patterns. An unexpected pattern has no logical basis—and I am skeptical of patterns that defy logic. Statistical tests assume that researchers have well-defined theories in mind and gather appropriate data to test their theories. This company assumed that I was eager and willing to pay a substantial amount of money to work the other way around. Look at every possible correlation—not caring whether they made sense or not—and report the correlations that turn out to be the most statistically persuasive. It is a sign of the times, but not an inspiring sign. Many important scientific theories started out as efforts to explain observed patterns. For example, during the 1800s,most biologists believed that parental characteristics were averaged together to determine the characteristics of their offspring. For example, a child’s height is an average of the father’s and mother’s heights, modified by environmental influences. However, Gregor Mendel discovered something quite different in his experiments with pea plants. Mendel was born in Austria in 1822 and grew up on his family’s farm. His parents expected him to take over the farm, but Mendel was an excellent student and became an Augustinian monk at a monastery known for its scientific library and research. Perhaps because of his farming roots, Mendel conducted meticulous studies of tens of thousands of pea plants grown in the monastery’s gardens over an eight-year period.","PeriodicalId":308433,"journal":{"name":"The AI Delusion","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The AI Delusion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/oso/9780198824305.003.0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
I recently received an e-mail that offered me a way to automate my research: Dear Professor Smith, We would like to introduce you to [our] brand new research tool . . . , ready to automate your empirical research basing on official statistical time series databases. [Our software] has been designed to explore and discover new exciting economic correlations directly from your desktop. No extra software required, no need to crawl thousands of databases manually. You’ll be up and running in no time your first big data project. The e-mail went on to boast that their software will calculate “correlation coefficients with millions of statistical time series,” “identify unexpected interdependences,” and “find new insights.” The creative grammar was one thing. More disheartening was their assumption that I wanted to sift through literally trillions of correlations looking for unexpected patterns. An unexpected pattern has no logical basis—and I am skeptical of patterns that defy logic. Statistical tests assume that researchers have well-defined theories in mind and gather appropriate data to test their theories. This company assumed that I was eager and willing to pay a substantial amount of money to work the other way around. Look at every possible correlation—not caring whether they made sense or not—and report the correlations that turn out to be the most statistically persuasive. It is a sign of the times, but not an inspiring sign. Many important scientific theories started out as efforts to explain observed patterns. For example, during the 1800s,most biologists believed that parental characteristics were averaged together to determine the characteristics of their offspring. For example, a child’s height is an average of the father’s and mother’s heights, modified by environmental influences. However, Gregor Mendel discovered something quite different in his experiments with pea plants. Mendel was born in Austria in 1822 and grew up on his family’s farm. His parents expected him to take over the farm, but Mendel was an excellent student and became an Augustinian monk at a monastery known for its scientific library and research. Perhaps because of his farming roots, Mendel conducted meticulous studies of tens of thousands of pea plants grown in the monastery’s gardens over an eight-year period.