George Hripcsak, Charles Knirsch, Li Zhou, Adam Wilcox, Genevieve Melton
{"title":"Bias associated with mining electronic health records.","authors":"George Hripcsak, Charles Knirsch, Li Zhou, Adam Wilcox, Genevieve Melton","doi":"10.5210/disco.v6i0.3581","DOIUrl":null,"url":null,"abstract":"<p><p>Large-scale electronic health record research introduces biases compared to traditional manually curated retrospective research. We used data from a community-acquired pneumonia study for which we had a gold standard to illustrate such biases. The challenges include data inaccuracy, incompleteness, and complexity, and they can produce in distorted results. We found that a naïve approach approximated the gold standard, but errors on a minority of cases shifted mortality substantially. Manual review revealed errors in both selecting and characterizing the cohort, and narrowing the cohort improved the result. Nevertheless, a significantly narrowed cohort might contain its own biases that would be difficult to estimate.</p>","PeriodicalId":87404,"journal":{"name":"Journal of biomedical discovery and collaboration","volume":"6 ","pages":"48-52"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.5210/disco.v6i0.3581","citationCount":"74","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of biomedical discovery and collaboration","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5210/disco.v6i0.3581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 74
Abstract
Large-scale electronic health record research introduces biases compared to traditional manually curated retrospective research. We used data from a community-acquired pneumonia study for which we had a gold standard to illustrate such biases. The challenges include data inaccuracy, incompleteness, and complexity, and they can produce in distorted results. We found that a naïve approach approximated the gold standard, but errors on a minority of cases shifted mortality substantially. Manual review revealed errors in both selecting and characterizing the cohort, and narrowing the cohort improved the result. Nevertheless, a significantly narrowed cohort might contain its own biases that would be difficult to estimate.