{"title":"使用坏数据","authors":"G. Smith, J. Cordes","doi":"10.1093/oso/9780198844396.003.0002","DOIUrl":null,"url":null,"abstract":"Good data scientists consider the reliability of the data, while data clowns don’t. Reported data sometimes systematically misrepresent the phenomena being recorded. Data can be deformed by extremely unusual data—outliers—which can be clerical errors, measurement errors, or flukes that can mislead us if not corrected. Other times, outliers are valuable data. We should always consider if data are skewed by unusual events or distorted by unreported “silent data.” If something is surprising about top-ranked groups, look at the bottom-ranked groups. Consider the possibility of survivorship bias and self-selection bias. Incomplete, inaccurate, or unreliable data can make clowns out of anyone.","PeriodicalId":331229,"journal":{"name":"The 9 Pitfalls of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using Bad Data\",\"authors\":\"G. Smith, J. Cordes\",\"doi\":\"10.1093/oso/9780198844396.003.0002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Good data scientists consider the reliability of the data, while data clowns don’t. Reported data sometimes systematically misrepresent the phenomena being recorded. Data can be deformed by extremely unusual data—outliers—which can be clerical errors, measurement errors, or flukes that can mislead us if not corrected. Other times, outliers are valuable data. We should always consider if data are skewed by unusual events or distorted by unreported “silent data.” If something is surprising about top-ranked groups, look at the bottom-ranked groups. Consider the possibility of survivorship bias and self-selection bias. Incomplete, inaccurate, or unreliable data can make clowns out of anyone.\",\"PeriodicalId\":331229,\"journal\":{\"name\":\"The 9 Pitfalls of Data Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 9 Pitfalls of Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/oso/9780198844396.003.0002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 9 Pitfalls of Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/oso/9780198844396.003.0002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Good data scientists consider the reliability of the data, while data clowns don’t. Reported data sometimes systematically misrepresent the phenomena being recorded. Data can be deformed by extremely unusual data—outliers—which can be clerical errors, measurement errors, or flukes that can mislead us if not corrected. Other times, outliers are valuable data. We should always consider if data are skewed by unusual events or distorted by unreported “silent data.” If something is surprising about top-ranked groups, look at the bottom-ranked groups. Consider the possibility of survivorship bias and self-selection bias. Incomplete, inaccurate, or unreliable data can make clowns out of anyone.