Reducing diagnostic delays in acute hepatic porphyria using health records data and machine learning.

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association Pub Date : 2025-01-01 DOI:10.1093/jamia/ocae141

Balu Bhasuran, Katharina Schmolly, Yuvraaj Kapoor, Nanditha Lakshmi Jayakumar, Raymond Doan, Jigar Amin, Stephen Meninger, Nathan Cheng, Robert Deering, Karl Anderson, Simon W Beaven, Bruce Wang, Vivek A Rudrapatna

{"title":"Reducing diagnostic delays in acute hepatic porphyria using health records data and machine learning.","authors":"Balu Bhasuran, Katharina Schmolly, Yuvraaj Kapoor, Nanditha Lakshmi Jayakumar, Raymond Doan, Jigar Amin, Stephen Meninger, Nathan Cheng, Robert Deering, Karl Anderson, Simon W Beaven, Bruce Wang, Vivek A Rudrapatna","doi":"10.1093/jamia/ocae141","DOIUrl":null,"url":null,"abstract":"Background: Acute hepatic porphyria (AHP) is a group of rare but treatable conditions associated with diagnostic delays of 15 years on average. The advent of electronic health records (EHR) data and machine learning (ML) may improve the timely recognition of rare diseases like AHP. However, prediction models can be difficult to train given the limited case numbers, unstructured EHR data, and selection biases intrinsic to healthcare delivery. We sought to train and characterize models for identifying patients with AHP.Methods: This diagnostic study used structured and notes-based EHR data from 2 centers at the University of California, UCSF (2012-2022) and UCLA (2019-2022). The data were split into 2 cohorts (referral and diagnosis) and used to develop models that predict (1) who will be referred for testing of acute porphyria, among those who presented with abdominal pain (a cardinal symptom of AHP), and (2) who will test positive, among those referred. The referral cohort consisted of 747 patients referred for testing and 99 849 contemporaneous patients who were not. The diagnosis cohort consisted of 72 confirmed AHP cases and 347 patients who tested negative. The case cohort was 81% female and 6-75 years old at the time of diagnosis. Candidate models used a range of architectures. Feature selection was semi-automated and incorporated publicly available data from knowledge graphs. Our primary outcome was the F-score on an outcome-stratified test set.Results: The best center-specific referral models achieved an F-score of 86%-91%. The best diagnosis model achieved an F-score of 92%. To further test our model, we contacted 372 current patients who lack an AHP diagnosis but were predicted by our models as potentially having it (≥10% probability of referral, ≥50% of testing positive). However, we were only able to recruit 10 of these patients for biochemical testing, all of whom were negative. Nonetheless, post hoc evaluations suggested that these models could identify 71% of cases earlier than their diagnosis date, saving 1.2 years.Conclusions: ML can reduce diagnostic delays in AHP and other rare diseases. Robust recruitment strategies and multicenter coordination will be needed to validate these models before they can be deployed.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"63-70"},"PeriodicalIF":4.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648717/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae141","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Acute hepatic porphyria (AHP) is a group of rare but treatable conditions associated with diagnostic delays of 15 years on average. The advent of electronic health records (EHR) data and machine learning (ML) may improve the timely recognition of rare diseases like AHP. However, prediction models can be difficult to train given the limited case numbers, unstructured EHR data, and selection biases intrinsic to healthcare delivery. We sought to train and characterize models for identifying patients with AHP.

Methods: This diagnostic study used structured and notes-based EHR data from 2 centers at the University of California, UCSF (2012-2022) and UCLA (2019-2022). The data were split into 2 cohorts (referral and diagnosis) and used to develop models that predict (1) who will be referred for testing of acute porphyria, among those who presented with abdominal pain (a cardinal symptom of AHP), and (2) who will test positive, among those referred. The referral cohort consisted of 747 patients referred for testing and 99 849 contemporaneous patients who were not. The diagnosis cohort consisted of 72 confirmed AHP cases and 347 patients who tested negative. The case cohort was 81% female and 6-75 years old at the time of diagnosis. Candidate models used a range of architectures. Feature selection was semi-automated and incorporated publicly available data from knowledge graphs. Our primary outcome was the F-score on an outcome-stratified test set.

Results: The best center-specific referral models achieved an F-score of 86%-91%. The best diagnosis model achieved an F-score of 92%. To further test our model, we contacted 372 current patients who lack an AHP diagnosis but were predicted by our models as potentially having it (≥10% probability of referral, ≥50% of testing positive). However, we were only able to recruit 10 of these patients for biochemical testing, all of whom were negative. Nonetheless, post hoc evaluations suggested that these models could identify 71% of cases earlier than their diagnosis date, saving 1.2 years.

Conclusions: ML can reduce diagnostic delays in AHP and other rare diseases. Robust recruitment strategies and multicenter coordination will be needed to validate these models before they can be deployed.

查看原文本刊更多论文

利用健康记录数据和机器学习减少急性肝性卟啉症的诊断延误。

背景：急性肝卟啉症（AHP）是一组罕见但可治疗的疾病，平均诊断延迟时间长达 15 年。电子健康记录（EHR）数据和机器学习（ML）的出现可能会改善对 AHP 等罕见疾病的及时识别。然而，由于病例数量有限、电子病历数据不结构化以及医疗服务固有的选择偏差，预测模型可能很难训练。我们试图训练和描述识别 AHP 患者的模型：这项诊断研究使用了加州大学旧金山分校（2012-2022 年）和加州大学洛杉矶分校（2019-2022 年）两个中心的结构化和基于笔记的电子病历数据。这些数据被分为两个队列（转诊和诊断），并用于建立模型，预测：(1) 在出现腹痛（AHP 的主要症状）的患者中，哪些人会被转诊接受急性卟啉症检测；(2) 在转诊患者中，哪些人会检测呈阳性。转诊队列由 747 名转诊患者和 99 849 名未转诊的同期患者组成。诊断队列包括 72 例确诊的 AHP 病例和 347 例检测呈阴性的患者。病例群中 81% 为女性，诊断时年龄为 6-75 岁。候选模型采用了一系列架构。特征选择是半自动化的，并结合了知识图谱中的公开数据。我们的主要结果是结果分层测试集上的 F 分数：结果：最佳中心特定转诊模型的 F 分数达到了 86%-91%。最佳诊断模型的 F 分数为 92%。为了进一步测试我们的模型，我们联系了 372 名目前没有 AHP 诊断但被我们的模型预测为可能有 AHP 诊断的患者（转诊概率≥10%，测试阳性概率≥50%）。然而，我们只能招募其中的 10 名患者进行生化检测，结果全部为阴性。尽管如此，事后评估表明，这些模型可以在诊断日期之前发现 71% 的病例，节省了 1.2 年的时间：结论：ML 可以减少 AHP 和其他罕见病的诊断延误。在部署这些模型之前，还需要强有力的招募策略和多中心协调来验证它们。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.