{"title":"数据优先于理论","authors":"G. Smith, J. Cordes","doi":"10.1093/oso/9780198844396.003.0003","DOIUrl":null,"url":null,"abstract":"The traditional statistical analysis of data follows what has come to be known as the scientific method: collecting reliable data to test plausible theories. Data mining goes in the other direction, analyzing data without being motivated or encumbered by theories. The fundamental problem with data mining is simple: We think that data patterns are unusual and therefore meaningful. Patterns are, in fact, inevitable and therefore meaningless. This is why data mining is not usually knowledge discovery, but noise discovery. Finding correlations is easy. Good data scientists are not seduced by discovered patterns because they don’t put data before theory. They do not commit Texas Sharpshooter Fallacies or fall into the Feynman Trap.","PeriodicalId":331229,"journal":{"name":"The 9 Pitfalls of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Putting Data Before Theory\",\"authors\":\"G. Smith, J. Cordes\",\"doi\":\"10.1093/oso/9780198844396.003.0003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The traditional statistical analysis of data follows what has come to be known as the scientific method: collecting reliable data to test plausible theories. Data mining goes in the other direction, analyzing data without being motivated or encumbered by theories. The fundamental problem with data mining is simple: We think that data patterns are unusual and therefore meaningful. Patterns are, in fact, inevitable and therefore meaningless. This is why data mining is not usually knowledge discovery, but noise discovery. Finding correlations is easy. Good data scientists are not seduced by discovered patterns because they don’t put data before theory. They do not commit Texas Sharpshooter Fallacies or fall into the Feynman Trap.\",\"PeriodicalId\":331229,\"journal\":{\"name\":\"The 9 Pitfalls of Data Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 9 Pitfalls of Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/oso/9780198844396.003.0003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 9 Pitfalls of Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/oso/9780198844396.003.0003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The traditional statistical analysis of data follows what has come to be known as the scientific method: collecting reliable data to test plausible theories. Data mining goes in the other direction, analyzing data without being motivated or encumbered by theories. The fundamental problem with data mining is simple: We think that data patterns are unusual and therefore meaningful. Patterns are, in fact, inevitable and therefore meaningless. This is why data mining is not usually knowledge discovery, but noise discovery. Finding correlations is easy. Good data scientists are not seduced by discovered patterns because they don’t put data before theory. They do not commit Texas Sharpshooter Fallacies or fall into the Feynman Trap.