{"title":"Fruitless Searches","authors":"Gary Smith, Jay Cordes","doi":"10.1093/oso/9780198864165.003.0007","DOIUrl":"https://doi.org/10.1093/oso/9780198864165.003.0007","url":null,"abstract":"The Internet provides a firehose of data that researchers can use to understand and predict people’s behavior. However, unless A/B tests are used, these data are not from randomized controlled trials that allow us to rule out confounding influences. In addition, the people using the Internet in general, and social media in particular, are surely unrepresentative and their activities should be used cautiously for drawing conclusions about the general population. Things we read or see on the Internet are not necessarily true. Things we do on the Internet are not necessarily informative. An unrestrained scrutiny of searches, updates, tweets, hashtags, images, videos, or captions is certain to turn up an essentially unlimited number of phantom patterns that are entirely coincidental, and completely worthless.","PeriodicalId":333158,"journal":{"name":"The Phantom Pattern Problem","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130775230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Reproducibility Crisis","authors":"G. Smith, J. Cordes","doi":"10.1093/oso/9780198864165.003.0008","DOIUrl":"https://doi.org/10.1093/oso/9780198864165.003.0008","url":null,"abstract":"Attempts to replicate reported studies often fail because the research relied on data mining—searching through data for patterns without any pre-specified, coherent theories. The perils of data mining can be exacerbated by data torturing—slicing, dicing, and otherwise mangling data to create patterns. If there is no underlying reason for a pattern, it is likely to disappear when someone attempts to replicate the study. Big data and powerful computers are part of the problem, not the solution, in that they can easily identify an essentially unlimited number of phantom patterns and relationships, which vanish when confronted with fresh data. If a researcher will benefit from a claim, it is likely to be biased. If a claim sounds implausible, it is probably misleading. If the statistical evidence sounds too good to be true, it probably is.","PeriodicalId":333158,"journal":{"name":"The Phantom Pattern Problem","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131020192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}