Quantitatively assessing the impact of the quality of SNOMED CT subtype hierarchy on cohort queries.

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association Pub Date : 2025-01-01 DOI:10.1093/jamia/ocae272

Xubing Hao, Xiaojin Li, Yan Huang, Jay Shi, Rashmie Abeysinghe, Cui Tao, Kirk Roberts, Guo-Qiang Zhang, Licong Cui

{"title":"Quantitatively assessing the impact of the quality of SNOMED CT subtype hierarchy on cohort queries.","authors":"Xubing Hao, Xiaojin Li, Yan Huang, Jay Shi, Rashmie Abeysinghe, Cui Tao, Kirk Roberts, Guo-Qiang Zhang, Licong Cui","doi":"10.1093/jamia/ocae272","DOIUrl":null,"url":null,"abstract":"Objective: SNOMED CT provides a standardized terminology for clinical concepts, allowing cohort queries over heterogeneous clinical data including Electronic Health Records (EHRs). While it is intuitive that missing and inaccurate subtype (or is-a) relations in SNOMED CT reduce the recall and precision of cohort queries, the extent of these impacts has not been formally assessed. This study fills this gap by developing quantitative metrics to measure these impacts and performing statistical analysis on their significance.Material and methods: We used the Optum de-identified COVID-19 Electronic Health Record dataset. We defined micro-averaged and macro-averaged recall and precision metrics to assess the impact of missing and inaccurate is-a relations on cohort queries. Both practical and simulated analyses were performed. Practical analyses involved 407 missing and 48 inaccurate is-a relations confirmed by domain experts, with statistical testing using Wilcoxon signed-rank tests. Simulated analyses used two random sets of 400 is-a relations to simulate missing and inaccurate is-a relations.Results: Wilcoxon signed-rank tests from both practical and simulated analyses (P-values < .001) showed that missing is-a relations significantly reduced the micro- and macro-averaged recall, and inaccurate is-a relations significantly reduced the micro- and macro-averaged precision.Discussion: The introduced impact metrics can assist SNOMED CT maintainers in prioritizing critical hierarchical defects for quality enhancement. These metrics are generally applicable for assessing the quality impact of a terminology's subtype hierarchy on its cohort query applications.Conclusion: Our results indicate a significant impact of missing and inaccurate is-a relations in SNOMED CT on the recall and precision of cohort queries. Our work highlights the importance of high-quality terminology hierarchy for cohort queries over EHR data and provides valuable insights for prioritizing quality improvements of SNOMED CT's hierarchy.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"89-96"},"PeriodicalIF":4.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648736/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae272","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: SNOMED CT provides a standardized terminology for clinical concepts, allowing cohort queries over heterogeneous clinical data including Electronic Health Records (EHRs). While it is intuitive that missing and inaccurate subtype (or is-a) relations in SNOMED CT reduce the recall and precision of cohort queries, the extent of these impacts has not been formally assessed. This study fills this gap by developing quantitative metrics to measure these impacts and performing statistical analysis on their significance.

Material and methods: We used the Optum de-identified COVID-19 Electronic Health Record dataset. We defined micro-averaged and macro-averaged recall and precision metrics to assess the impact of missing and inaccurate is-a relations on cohort queries. Both practical and simulated analyses were performed. Practical analyses involved 407 missing and 48 inaccurate is-a relations confirmed by domain experts, with statistical testing using Wilcoxon signed-rank tests. Simulated analyses used two random sets of 400 is-a relations to simulate missing and inaccurate is-a relations.

Results: Wilcoxon signed-rank tests from both practical and simulated analyses (P-values < .001) showed that missing is-a relations significantly reduced the micro- and macro-averaged recall, and inaccurate is-a relations significantly reduced the micro- and macro-averaged precision.

Discussion: The introduced impact metrics can assist SNOMED CT maintainers in prioritizing critical hierarchical defects for quality enhancement. These metrics are generally applicable for assessing the quality impact of a terminology's subtype hierarchy on its cohort query applications.

Conclusion: Our results indicate a significant impact of missing and inaccurate is-a relations in SNOMED CT on the recall and precision of cohort queries. Our work highlights the importance of high-quality terminology hierarchy for cohort queries over EHR data and provides valuable insights for prioritizing quality improvements of SNOMED CT's hierarchy.

查看原文本刊更多论文

定量评估 SNOMED CT 亚型分级质量对队列查询的影响。

目的SNOMED CT 为临床概念提供了标准化术语，允许对包括电子健康记录 (EHR) 在内的异构临床数据进行队列查询。SNOMED CT 中缺失和不准确的子类型（或 is-a）关系会降低队列查询的召回率和精确度，这一点很直观，但这些影响的程度尚未得到正式评估。本研究通过制定量化指标来衡量这些影响并对其重要性进行统计分析，填补了这一空白：我们使用了 Optum 去标识化 COVID-19 电子健康记录数据集。我们定义了微观平均和宏观平均召回率和精确度指标，以评估缺失和不准确的 is-a 关系对队列查询的影响。我们进行了实际分析和模拟分析。实际分析包括经领域专家确认的 407 个缺失的 is-a 关系和 48 个不准确的 is-a 关系，并使用 Wilcoxon 符号秩检验进行统计检验。模拟分析使用了两组随机的 400 个 is-a 关系来模拟缺失和不准确的 is-a 关系：实际分析和模拟分析的 Wilcoxon 符号秩检验（P 值 < .001）表明，缺失的 is-a 关系显著降低了微观和宏观平均召回率，而不准确的 is-a 关系显著降低了微观和宏观平均精确率：所介绍的影响度量标准可以帮助 SNOMED CT 维护者优先处理关键的分层缺陷，以提高质量。这些指标通常适用于评估术语的子类型层次结构对其同类查询应用的质量影响：我们的研究结果表明，SNOMED CT 中缺失和不准确的 is-a 关系对队列查询的召回率和精确度有很大影响。我们的工作凸显了高质量术语层次结构对电子病历数据队列查询的重要性，并为优先提高 SNOMED CT 层次结构的质量提供了有价值的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.