{"title":"Confusing Correlation with Causation","authors":"G. Smith, J. Cordes","doi":"10.1093/oso/9780198844396.003.0008","DOIUrl":null,"url":null,"abstract":"There is a hierarchy of predictive value that can be extracted from data. At the top of the hierarchy are causal relationships that can be confirmed with a randomized and controlled experiment or a natural experiment. Next best is to establish known or hypothesized relationships ahead of time and then test them and estimate their relative importance. One notch lower are associations found in historical data that are tested on fresh data after considering whether or not they make sense. At the bottom of the hierarchy, with little or no value, are associations found in historical data that are not confirmed by expert opinion or tested with fresh data. Data scientists who use a “correlations are enough” approach should remember that the more data and the more searches, the more likely it is that a discovered statistical relationship is coincidental and useless.","PeriodicalId":331229,"journal":{"name":"The 9 Pitfalls of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 9 Pitfalls of Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/oso/9780198844396.003.0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
There is a hierarchy of predictive value that can be extracted from data. At the top of the hierarchy are causal relationships that can be confirmed with a randomized and controlled experiment or a natural experiment. Next best is to establish known or hypothesized relationships ahead of time and then test them and estimate their relative importance. One notch lower are associations found in historical data that are tested on fresh data after considering whether or not they make sense. At the bottom of the hierarchy, with little or no value, are associations found in historical data that are not confirmed by expert opinion or tested with fresh data. Data scientists who use a “correlations are enough” approach should remember that the more data and the more searches, the more likely it is that a discovered statistical relationship is coincidental and useless.