{"title":"Outliers in diagnosis ratios: A clue toward possibly absent data.","authors":"Dmitry Morozyuk, Mark G Weiner","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>The evaluation of completeness of real-world data is a particularly challenging component of data quality assessment because the degree of truly versus erroneously absent data is unknown. Among inpatient data sets, while absolute counts of admissions having specific categories of diagnoses in the principal or any position may vary depending on hospital size, we hypothesized that the ratio of these parameters will be preserved across sites, with outliers suggesting the potential for erroneously absent data. For several categories of clinical conditions assigned to inpatient admissions, we analyzed the ratio of their recording as the principal diagnosis versus any diagnosis across several hospitals and compared the ratios against a national benchmark. Our analysis showed ratios that matched clinical expectations, with reasonable preservation of ratios across sites. However, some conditions exhibited more variability in the ratios and some sites had many outliers possibly reflecting data quality issues that warrant further attention.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"1175-1182"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785923/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA ... Annual Symposium proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The evaluation of completeness of real-world data is a particularly challenging component of data quality assessment because the degree of truly versus erroneously absent data is unknown. Among inpatient data sets, while absolute counts of admissions having specific categories of diagnoses in the principal or any position may vary depending on hospital size, we hypothesized that the ratio of these parameters will be preserved across sites, with outliers suggesting the potential for erroneously absent data. For several categories of clinical conditions assigned to inpatient admissions, we analyzed the ratio of their recording as the principal diagnosis versus any diagnosis across several hospitals and compared the ratios against a national benchmark. Our analysis showed ratios that matched clinical expectations, with reasonable preservation of ratios across sites. However, some conditions exhibited more variability in the ratios and some sites had many outliers possibly reflecting data quality issues that warrant further attention.