If You Torture the Data Long Enough

Gary Smith
{"title":"If You Torture the Data Long Enough","authors":"Gary Smith","doi":"10.1093/oso/9780198824305.003.0008","DOIUrl":null,"url":null,"abstract":"I recently received an e-mail that offered me a way to automate my research: Dear Professor Smith, We would like to introduce you to [our] brand new research tool . . . , ready to automate your empirical research basing on official statistical time series databases. [Our software] has been designed to explore and discover new exciting economic correlations directly from your desktop. No extra software required, no need to crawl thousands of databases manually. You’ll be up and running in no time your first big data project. The e-mail went on to boast that their software will calculate “correlation coefficients with millions of statistical time series,” “identify unexpected interdependences,” and “find new insights.” The creative grammar was one thing. More disheartening was their assumption that I wanted to sift through literally trillions of correlations looking for unexpected patterns. An unexpected pattern has no logical basis—and I am skeptical of patterns that defy logic. Statistical tests assume that researchers have well-defined theories in mind and gather appropriate data to test their theories. This company assumed that I was eager and willing to pay a substantial amount of money to work the other way around. Look at every possible correlation—not caring whether they made sense or not—and report the correlations that turn out to be the most statistically persuasive. It is a sign of the times, but not an inspiring sign. Many important scientific theories started out as efforts to explain observed patterns. For example, during the 1800s,most biologists believed that parental characteristics were averaged together to determine the characteristics of their offspring. For example, a child’s height is an average of the father’s and mother’s heights, modified by environmental influences. However, Gregor Mendel discovered something quite different in his experiments with pea plants. Mendel was born in Austria in 1822 and grew up on his family’s farm. His parents expected him to take over the farm, but Mendel was an excellent student and became an Augustinian monk at a monastery known for its scientific library and research. Perhaps because of his farming roots, Mendel conducted meticulous studies of tens of thousands of pea plants grown in the monastery’s gardens over an eight-year period.","PeriodicalId":308433,"journal":{"name":"The AI Delusion","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The AI Delusion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/oso/9780198824305.003.0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

I recently received an e-mail that offered me a way to automate my research: Dear Professor Smith, We would like to introduce you to [our] brand new research tool . . . , ready to automate your empirical research basing on official statistical time series databases. [Our software] has been designed to explore and discover new exciting economic correlations directly from your desktop. No extra software required, no need to crawl thousands of databases manually. You’ll be up and running in no time your first big data project. The e-mail went on to boast that their software will calculate “correlation coefficients with millions of statistical time series,” “identify unexpected interdependences,” and “find new insights.” The creative grammar was one thing. More disheartening was their assumption that I wanted to sift through literally trillions of correlations looking for unexpected patterns. An unexpected pattern has no logical basis—and I am skeptical of patterns that defy logic. Statistical tests assume that researchers have well-defined theories in mind and gather appropriate data to test their theories. This company assumed that I was eager and willing to pay a substantial amount of money to work the other way around. Look at every possible correlation—not caring whether they made sense or not—and report the correlations that turn out to be the most statistically persuasive. It is a sign of the times, but not an inspiring sign. Many important scientific theories started out as efforts to explain observed patterns. For example, during the 1800s,most biologists believed that parental characteristics were averaged together to determine the characteristics of their offspring. For example, a child’s height is an average of the father’s and mother’s heights, modified by environmental influences. However, Gregor Mendel discovered something quite different in his experiments with pea plants. Mendel was born in Austria in 1822 and grew up on his family’s farm. His parents expected him to take over the farm, but Mendel was an excellent student and became an Augustinian monk at a monastery known for its scientific library and research. Perhaps because of his farming roots, Mendel conducted meticulous studies of tens of thousands of pea plants grown in the monastery’s gardens over an eight-year period.
如果你折磨数据的时间足够长
我最近收到一封电子邮件,给我提供了一种自动化研究的方法:亲爱的史密斯教授,我们想向您介绍[我们的]全新的研究工具…,准备好根据官方统计时间序列数据库自动化你的实证研究。[我们的软件]旨在探索和发现新的令人兴奋的经济相关性直接从你的桌面。不需要额外的软件,不需要手动抓取数千个数据库。您很快就可以启动并运行您的第一个大数据项目。电子邮件继续吹嘘他们的软件将计算“数百万统计时间序列的相关系数”,“识别意想不到的相互依赖性”,并“发现新的见解”。创造性语法是一方面。更令人沮丧的是,他们假设我想要筛选数以万亿计的相关性,以寻找意想不到的模式。一个意想不到的模式没有逻辑基础——我对违背逻辑的模式持怀疑态度。统计测试假设研究人员心中有明确的理论,并收集适当的数据来检验他们的理论。这家公司认为,我渴望并愿意支付一大笔钱,以另一种方式工作。看看每一个可能的相关性——不管它们是否有意义——然后报告那些在统计上最有说服力的相关性。这是一个时代的标志,但不是一个鼓舞人心的标志。许多重要的科学理论最初都是为了解释观察到的模式。例如,在19世纪,大多数生物学家认为,父母的特征是平均在一起的,以确定他们的后代的特征。例如,孩子的身高是父亲和母亲身高的平均值,受环境影响而有所改变。然而,格里高尔·孟德尔在他的豌豆实验中发现了一些完全不同的东西。孟德尔1822年出生于奥地利,在自家的农场里长大。他的父母希望他接管农场,但孟德尔是一名优秀的学生,并成为一所以科学图书馆和研究而闻名的修道院的奥古斯丁僧侣。也许是因为他的农业出身,孟德尔在八年的时间里对修道院花园里种植的数万株豌豆进行了细致的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信