A machine learning model for early diagnosis of type 1 Gaucher disease using real-life data

IF 7.3 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Avraham Tenenbaum , Shoshana Revel-Vilk , Sivan Gazit , Michael Roimi , Aidan Gill , Dafna Gilboa , Ora Paltiel , Orly Manor , Varda Shalev , Gabriel Chodick
{"title":"A machine learning model for early diagnosis of type 1 Gaucher disease using real-life data","authors":"Avraham Tenenbaum ,&nbsp;Shoshana Revel-Vilk ,&nbsp;Sivan Gazit ,&nbsp;Michael Roimi ,&nbsp;Aidan Gill ,&nbsp;Dafna Gilboa ,&nbsp;Ora Paltiel ,&nbsp;Orly Manor ,&nbsp;Varda Shalev ,&nbsp;Gabriel Chodick","doi":"10.1016/j.jclinepi.2024.111517","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>The diagnosis of Gaucher disease (GD) presents a major challenge due to the high variability and low specificity of its clinical characteristics, along with limited physician awareness of the disease’s early symptoms. Early and accurate diagnosis is important to enable effective treatment decisions, prevent unnecessary testing, and facilitate genetic counseling. This study aimed to develop a machine learning (ML) model for GD screening and GD early diagnosis based on real-world clinical data using the Maccabi Healthcare Services electronic database, which contains 20 years of longitudinal data on approximately 2.6 million patients.</div></div><div><h3>Study Design and Setting</h3><div>We screened the Maccabi Healthcare Services database for patients with GD between January 1998 and May 2022. Eligible controls were matched by year of birth, sex, and socioeconomic status in a 1:13 ratio. The data were partitioned into 75% training and 25% test sets and trained to predict GD using features obtained from medical and laboratory records. Model performances were evaluated using the area under the receiver operating characteristic curve and the area under the precision-recall curve.</div></div><div><h3>Results</h3><div>We detected 264 confirmed patients with GD to which we matched 3,429 controls. The best model performance (which included known GD signs and symptoms, previously unknown clinical features, and administrative codes) on the test set had an area under the receiver operating characteristic curve = 0.95 ± 0.03 and area under the precision-recall curve = 0.80 ± 0.08, which yielded a median GD identification of 2.78 years earlier than the clinical diagnosis (25th–75th percentile: 1.29–4.53).</div></div><div><h3>Conclusion</h3><div>Using an ML approach on real-world data led to excellent discrimination between GD patients and controls, with the ability to detect GD significantly earlier than the time of actual diagnosis. Hence, this approach might be useful as a screening tool for GD and lead to earlier diagnosis and treatment. Furthermore, advanced ML analytics may highlight previously unrecognized features associated with GD, including clinical diagnoses and health-seeking behaviors.</div></div><div><h3>Plain Language Summary</h3><div>Diagnosing Gaucher disease is difficult, which often leads to late or incorrect diagnoses. As a result, patients may undergo unnecessary tests and treatments and experience health deterioration despite medications availability for Gaucher disease. In this study, we used electronic health data to develop machine learning models for early diagnosis of Gaucher disease type 1. Our models, which included known Gaucher disease signs and symptoms, previously unknown clinical features, and administrative codes, were able to significantly outperform other models and expert opinions, detecting type 1 Gaucher disease 3 years on average before actual diagnosis. Our models also revealed new features linked to type 1 Gaucher disease, including specific diagnoses and patterns in patients’ healthcare-seeking behaviors. We believe that the tool of machine learning can be valuable for patients with rare diseases.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111517"},"PeriodicalIF":7.3000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895435624002737","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

The diagnosis of Gaucher disease (GD) presents a major challenge due to the high variability and low specificity of its clinical characteristics, along with limited physician awareness of the disease’s early symptoms. Early and accurate diagnosis is important to enable effective treatment decisions, prevent unnecessary testing, and facilitate genetic counseling. This study aimed to develop a machine learning (ML) model for GD screening and GD early diagnosis based on real-world clinical data using the Maccabi Healthcare Services electronic database, which contains 20 years of longitudinal data on approximately 2.6 million patients.

Study Design and Setting

We screened the Maccabi Healthcare Services database for patients with GD between January 1998 and May 2022. Eligible controls were matched by year of birth, sex, and socioeconomic status in a 1:13 ratio. The data were partitioned into 75% training and 25% test sets and trained to predict GD using features obtained from medical and laboratory records. Model performances were evaluated using the area under the receiver operating characteristic curve and the area under the precision-recall curve.

Results

We detected 264 confirmed patients with GD to which we matched 3,429 controls. The best model performance (which included known GD signs and symptoms, previously unknown clinical features, and administrative codes) on the test set had an area under the receiver operating characteristic curve = 0.95 ± 0.03 and area under the precision-recall curve = 0.80 ± 0.08, which yielded a median GD identification of 2.78 years earlier than the clinical diagnosis (25th–75th percentile: 1.29–4.53).

Conclusion

Using an ML approach on real-world data led to excellent discrimination between GD patients and controls, with the ability to detect GD significantly earlier than the time of actual diagnosis. Hence, this approach might be useful as a screening tool for GD and lead to earlier diagnosis and treatment. Furthermore, advanced ML analytics may highlight previously unrecognized features associated with GD, including clinical diagnoses and health-seeking behaviors.

Plain Language Summary

Diagnosing Gaucher disease is difficult, which often leads to late or incorrect diagnoses. As a result, patients may undergo unnecessary tests and treatments and experience health deterioration despite medications availability for Gaucher disease. In this study, we used electronic health data to develop machine learning models for early diagnosis of Gaucher disease type 1. Our models, which included known Gaucher disease signs and symptoms, previously unknown clinical features, and administrative codes, were able to significantly outperform other models and expert opinions, detecting type 1 Gaucher disease 3 years on average before actual diagnosis. Our models also revealed new features linked to type 1 Gaucher disease, including specific diagnoses and patterns in patients’ healthcare-seeking behaviors. We believe that the tool of machine learning can be valuable for patients with rare diseases.

Abstract Image

利用真实生活数据早期诊断 1 型戈谢病的机器学习模型。
目的:戈谢病(GD)的临床特征变异性大、特异性低,而且医生对该病的早期症状认识有限,因此诊断该病是一项重大挑战。早期准确的诊断对于做出有效的治疗决定、避免不必要的检查以及促进遗传咨询非常重要。本研究旨在利用马卡比医疗保健服务(MHS)电子数据库(该数据库包含约 260 万名患者的 20 年纵向数据),基于真实世界的临床数据,开发一种用于 GD 筛查和 GD 早期诊断的机器学习(ML)模型:我们在 Maccabi Healthcare Services(MHS)数据库中筛选了 1998 年 1 月至 2022 年 5 月间的 GD 患者。符合条件的对照组按出生年份、性别和社会经济地位以 1:13 的比例进行匹配。数据被分为 75% 的训练集和 25% 的测试集,并利用从医疗和化验记录中获取的特征进行训练,以预测 GD。使用接收者操作特征曲线下面积(AUROC)和精确度-召回曲线下面积(AUPRC)对模型性能进行评估:我们发现了 264 名确诊的 GD 患者,并与 3429 名对照者进行了配对。测试集上的最佳模型性能(包括已知的 GD 体征和症状、先前未知的临床特征和管理代码)为 AUROC = 0.95 ± 0.03 和 AUPRC = 0.80 ± 0.08,GD 鉴定的中位数比临床诊断早 2.78 年(第 25-75 百分位数:1.29-4.53):在真实世界的数据中使用多重层析方法,可以很好地区分 GD 患者和对照组,并能显著早于实际诊断时间发现 GD。因此,这种方法可作为 GD 的筛查工具,并有助于早期诊断和治疗。此外,先进的 ML 分析可能会突出以前未认识到的与 GD 相关的特征,包括临床诊断和寻求健康的行为。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Clinical Epidemiology
Journal of Clinical Epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
12.00
自引率
6.90%
发文量
320
审稿时长
44 days
期刊介绍: The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信