Development of machine learning-based mpox surveillance models in a learning health system.

IF 2.9 3区 医学 Q2 INFECTIOUS DISEASES
Harry Reyes Nieva, Jason Zucker, Emma Tucker, Jacob McLean, Clare DeLaurentis, Shauna Gunaratne, Noémie Elhadad
{"title":"Development of machine learning-based mpox surveillance models in a learning health system.","authors":"Harry Reyes Nieva, Jason Zucker, Emma Tucker, Jacob McLean, Clare DeLaurentis, Shauna Gunaratne, Noémie Elhadad","doi":"10.1136/sextrans-2024-056382","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to develop robust machine learning (ML)-based and deep learning (DL)-based models capable of detecting mpox cases for surveillance efforts using clinical notes.</p><p><strong>Methods: </strong>As part of a learning health system initiative, we conducted a retrospective study of clinical encounters at the Columbia University Irving Medical Center in New York City. We included patients with mpox diagnoses confirmed by PCR testing between 15 May 2022 and 15 October 2022 and three matched controls for each case based on patient age, sex, race, ethnicity and visit month. We trained three mpox surveillance models using: (1) logistic regression with L1 regularisation (least absolute shrinkage and selection operator (LASSO)), (2) ClinicalBERT and (3) ClinicalLongformer. We evaluated model performance using precision, recall, F1 score, area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC) and recall at 80% precision (RP80).</p><p><strong>Results: </strong>The study included 228 PCR-confirmed mpox cases and 698 controls. LASSO regression outperformed the DL models with a precision, recall and F1 score of 0.93, AUROC of 0.97, AUPRC of 0.93 and RP80 of 0.89. ClinicalBERT achieved a precision of 0.88, recall of 0.89, F1 score of 0.88 and AUROC of 0.93. ClinicalLongformer achieved a precision of 0.87, recall of 0.88, F1 score of 0.87 and AUROC of 0.92. Phrases related to symptoms (eg, lesions and pain) were among the most predictive features in LASSO regression.</p><p><strong>Conclusions: </strong>ML and DL models based on clinical notes show promise for identifying mpox cases. In this study, LASSO regression outperformed DL models and excelled in minimising false positives. These findings highlight the potential for ML and DL methods to support case surveillance for mpox and other infectious diseases. These methods may also prove helpful for flagging missed or delayed diagnoses as part of continuous quality improvement.</p>","PeriodicalId":21624,"journal":{"name":"Sexually Transmitted Infections","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12353557/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sexually Transmitted Infections","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/sextrans-2024-056382","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: This study aimed to develop robust machine learning (ML)-based and deep learning (DL)-based models capable of detecting mpox cases for surveillance efforts using clinical notes.

Methods: As part of a learning health system initiative, we conducted a retrospective study of clinical encounters at the Columbia University Irving Medical Center in New York City. We included patients with mpox diagnoses confirmed by PCR testing between 15 May 2022 and 15 October 2022 and three matched controls for each case based on patient age, sex, race, ethnicity and visit month. We trained three mpox surveillance models using: (1) logistic regression with L1 regularisation (least absolute shrinkage and selection operator (LASSO)), (2) ClinicalBERT and (3) ClinicalLongformer. We evaluated model performance using precision, recall, F1 score, area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC) and recall at 80% precision (RP80).

Results: The study included 228 PCR-confirmed mpox cases and 698 controls. LASSO regression outperformed the DL models with a precision, recall and F1 score of 0.93, AUROC of 0.97, AUPRC of 0.93 and RP80 of 0.89. ClinicalBERT achieved a precision of 0.88, recall of 0.89, F1 score of 0.88 and AUROC of 0.93. ClinicalLongformer achieved a precision of 0.87, recall of 0.88, F1 score of 0.87 and AUROC of 0.92. Phrases related to symptoms (eg, lesions and pain) were among the most predictive features in LASSO regression.

Conclusions: ML and DL models based on clinical notes show promise for identifying mpox cases. In this study, LASSO regression outperformed DL models and excelled in minimising false positives. These findings highlight the potential for ML and DL methods to support case surveillance for mpox and other infectious diseases. These methods may also prove helpful for flagging missed or delayed diagnoses as part of continuous quality improvement.

学习型卫生系统中基于机器学习的麻疹监测模型的开发。
目的:本研究旨在开发强大的基于机器学习(ML)和基于深度学习(DL)的模型,这些模型能够检测m痘病例,并利用临床记录进行监测。方法:作为学习卫生系统倡议的一部分,我们对纽约市哥伦比亚大学欧文医学中心的临床遭遇进行了回顾性研究。我们纳入了2022年5月15日至2022年10月15日期间通过PCR检测确诊的m痘患者,并根据患者的年龄、性别、种族、民族和就诊月份为每个病例提供了三个匹配的对照。我们使用以下方法训练了三个mpox监测模型:(1)L1正则化逻辑回归(最小绝对收缩和选择算子(LASSO)), (2) ClinicalBERT和(3)ClinicalLongformer。我们使用精确度、召回率、F1分数、接收者工作特征曲线下面积(AUROC)、精确度-召回率曲线下面积(AUPRC)和80%精确召回率(RP80)来评估模型的性能。结果:本研究包括228例pcr确诊的m痘病例和698例对照。LASSO回归优于DL模型,其精度、召回率和F1分数为0.93,AUROC为0.97,AUPRC为0.93,RP80为0.89。ClinicalBERT的准确率为0.88,召回率为0.89,F1评分为0.88,AUROC为0.93。ClinicalLongformer的准确率为0.87,召回率为0.88,F1评分为0.87,AUROC为0.92。与症状相关的短语(例如,病变和疼痛)是LASSO回归中最具预测性的特征之一。结论:基于临床记录的ML和DL模型显示了识别m痘病例的希望。在本研究中,LASSO回归优于DL模型,并且在最小化误报方面表现出色。这些发现突出了ML和DL方法在支持麻疹和其他传染病病例监测方面的潜力。这些方法也可能被证明有助于标记漏诊或延迟诊断,作为持续质量改进的一部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Sexually Transmitted Infections
Sexually Transmitted Infections 医学-传染病学
CiteScore
5.70
自引率
8.30%
发文量
96
审稿时长
4-8 weeks
期刊介绍: Sexually Transmitted Infections is the world’s longest running international journal on sexual health. It aims to keep practitioners, trainees and researchers up to date in the prevention, diagnosis and treatment of all STIs and HIV. The journal publishes original research, descriptive epidemiology, evidence-based reviews and comment on the clinical, public health, sociological and laboratory aspects of sexual health from around the world. We also publish educational articles, letters and other material of interest to readers, along with podcasts and other online material. STI provides a high quality editorial service from submission to publication.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信