IF 2.9 3区 医学 Q2 HEALTH CARE SCIENCES & SERVICES
DIGITAL HEALTH Pub Date : 2025-04-03 eCollection Date: 2025-01-01 DOI:10.1177/20552076251331895
Lorenzo Argante, Germain Lonnet, Emmanuel Aris, Jane Whelan
{"title":"Beyond the STI clinic: Use of administrative claims data and machine learning to develop and validate patient-level prediction models for gonorrhea.","authors":"Lorenzo Argante, Germain Lonnet, Emmanuel Aris, Jane Whelan","doi":"10.1177/20552076251331895","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Gonorrhea is a sexually transmitted infection (STI) that, untreated, can result in debilitating complications such as pelvic inflammatory disease, pain, and infertility. A minority of cases are diagnosed in STI clinics in the United States. Gonorrhea is often asymptomatic and presumed to be substantially underdiagnosed and/or undertreated.</p><p><strong>Objectives: </strong>To generate and compare predictive machine learning (ML) models using administrative claims data to characterize young women in the general United States population who would be most likely to contract gonorrhea.</p><p><strong>Methods: </strong>Data were extracted from the Merative™ MarketScan<sup>®</sup> Commercial and Medicaid databases containing routinely collected administrative claims data. Women aged 16-35 years with two years of continuous observation between 1 January 2017 and 31 December 2018 were included. ML classification models were constructed based on logistic regression and tree-based algorithms.</p><p><strong>Results: </strong>Models constructed using tree-based algorithms such as XGBoost provided the best discriminatory results, but simpler ridge regressions models with splines also achieved reasonable discrimination, allowing for the identification of population subsets at increased risk of gonorrhea infection. A subset of 0.1% of the population identified by the XGBoost model had a 70-fold higher risk of gonorrhea than the general population. External validation applying the different models on a Medicaid dataset that was not included in developing the original models was checked and deemed acceptable.</p><p><strong>Conclusions: </strong>The models and methods presented here could facilitate the identification of women at high risk of contracting gonorrhea for whom targeted preventive measures may be most beneficial.</p>","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":"11 ","pages":"20552076251331895"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11970062/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076251331895","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:淋病是一种性传播感染(STI),如不及时治疗,可导致衰弱性并发症,如盆腔炎、疼痛和不孕症。在美国,少数病例是在性传播感染诊所确诊的。淋病通常没有症状,因此被认为诊断和/或治疗严重不足:目的:利用行政报销数据生成并比较预测性机器学习(ML)模型,以描述美国普通人群中最有可能感染淋病的年轻女性的特征:数据提取自 Merative™ MarketScan® 商业和医疗补助数据库,其中包含定期收集的行政索赔数据。纳入了在 2017 年 1 月 1 日至 2018 年 12 月 31 日期间连续观察两年的 16-35 岁女性。根据逻辑回归和基于树的算法构建了 ML 分类模型:使用基于树的算法(如 XGBoost)构建的模型提供了最佳的判别结果,但较简单的带样条的脊回归模型也实现了合理的判别,从而可以识别淋病感染风险增加的人群子集。XGBoost 模型识别出的 0.1% 的人群子集感染淋病的风险比普通人群高 70 倍。在一个医疗补助数据集上应用不同的模型进行了外部验证,该数据集未包括在原始模型的开发中,经检查后认为可以接受:本文介绍的模型和方法有助于识别感染淋病风险高的妇女,对她们采取有针对性的预防措施可能最有益处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Beyond the STI clinic: Use of administrative claims data and machine learning to develop and validate patient-level prediction models for gonorrhea.

Background: Gonorrhea is a sexually transmitted infection (STI) that, untreated, can result in debilitating complications such as pelvic inflammatory disease, pain, and infertility. A minority of cases are diagnosed in STI clinics in the United States. Gonorrhea is often asymptomatic and presumed to be substantially underdiagnosed and/or undertreated.

Objectives: To generate and compare predictive machine learning (ML) models using administrative claims data to characterize young women in the general United States population who would be most likely to contract gonorrhea.

Methods: Data were extracted from the Merative™ MarketScan® Commercial and Medicaid databases containing routinely collected administrative claims data. Women aged 16-35 years with two years of continuous observation between 1 January 2017 and 31 December 2018 were included. ML classification models were constructed based on logistic regression and tree-based algorithms.

Results: Models constructed using tree-based algorithms such as XGBoost provided the best discriminatory results, but simpler ridge regressions models with splines also achieved reasonable discrimination, allowing for the identification of population subsets at increased risk of gonorrhea infection. A subset of 0.1% of the population identified by the XGBoost model had a 70-fold higher risk of gonorrhea than the general population. External validation applying the different models on a Medicaid dataset that was not included in developing the original models was checked and deemed acceptable.

Conclusions: The models and methods presented here could facilitate the identification of women at high risk of contracting gonorrhea for whom targeted preventive measures may be most beneficial.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
DIGITAL HEALTH
DIGITAL HEALTH Multiple-
CiteScore
2.90
自引率
7.70%
发文量
302
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信