Development and validation of prediction models for stroke and myocardial infarction in type 2 diabetes based on health insurance claims: does machine learning outperform traditional regression approaches?

IF 8.5 1区 医学 Q1 CARDIAC & CARDIOVASCULAR SYSTEMS
Anna-Janina Stephan, Michael Hanselmann, Medina Bajramovic, Simon Schosser, Michael Laxy
{"title":"Development and validation of prediction models for stroke and myocardial infarction in type 2 diabetes based on health insurance claims: does machine learning outperform traditional regression approaches?","authors":"Anna-Janina Stephan, Michael Hanselmann, Medina Bajramovic, Simon Schosser, Michael Laxy","doi":"10.1186/s12933-025-02640-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Digitalization and big health system data open new avenues for targeted prevention and treatment strategies. We aimed to develop and validate prediction models for stroke and myocardial infarction (MI) in patients with type 2 diabetes based on routinely collected high-dimensional health insurance claims and compared predictive performance of traditional regression with state-of-the-art machine learning including deep learning methods.</p><p><strong>Methods: </strong>We used German health insurance claims from 2014 to 2019 with 287 potentially relevant literature-derived variables to predict 3-year risk of MI and stroke. Following a train-test split approach, we compared the performance of logistic methods with and without forward selection, LASSO-regularization, random forests (RF), gradient boosting (GB), multi-layer-perceptrons (MLP) and feature-tokenizer transformers (FTT). We assessed discrimination (Areas Under the Precision-Recall and Receiver-Operator Curves, AUPRC and AUROC) and calibration.</p><p><strong>Results: </strong>Among n = 371,006 patients with type 2 diabetes (mean age: 67.2 years), 3.5% (n = 13,030) had MIs and 3.4% (n = 12,701) strokes. AUPRCs were 0.035 (MI) and 0.034 (stroke) for a null model, between 0.082 (MLP) and 0.092 (GB) for MI, and between 0.061 (MLP) and 0.073 (GB) for stoke. AUROCs were 0.5 for null models, between 0.70 (RF, MLP, FTT) and 0.71 (all other models) for MI, and between 0.66 (MLP) and 0.69 (GB) for stroke. All models were well calibrated.</p><p><strong>Conclusions: </strong>Discrimination performance of claims-based models reached a ceiling at around 0.09 AUPRC and 0.7 AUROC. While for AUROC this performance was comparable to existing epidemiological models incorporating clinical information, comparison of other, potentially more relevant metrics, such as AUPRC, sensitivity and Positive Predictive Value was hampered by lack of reporting in the literature. The fact that machine learning including deep learning methods did not outperform more traditional approaches may suggest that feature richness and complexity were exploited before the choice of algorithm could become critical to maximize performance. Future research might focus on the impact of different feature derivation approaches on performance ceilings. In the absence of other more powerful screening alternatives, applying transparent regression-based models in routine claims, though certainly imperfect, remains a promising scalable low-cost approach for population-based cardiovascular risk prediction and stratification.</p>","PeriodicalId":9374,"journal":{"name":"Cardiovascular Diabetology","volume":"24 1","pages":"80"},"PeriodicalIF":8.5000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cardiovascular Diabetology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12933-025-02640-9","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Digitalization and big health system data open new avenues for targeted prevention and treatment strategies. We aimed to develop and validate prediction models for stroke and myocardial infarction (MI) in patients with type 2 diabetes based on routinely collected high-dimensional health insurance claims and compared predictive performance of traditional regression with state-of-the-art machine learning including deep learning methods.

Methods: We used German health insurance claims from 2014 to 2019 with 287 potentially relevant literature-derived variables to predict 3-year risk of MI and stroke. Following a train-test split approach, we compared the performance of logistic methods with and without forward selection, LASSO-regularization, random forests (RF), gradient boosting (GB), multi-layer-perceptrons (MLP) and feature-tokenizer transformers (FTT). We assessed discrimination (Areas Under the Precision-Recall and Receiver-Operator Curves, AUPRC and AUROC) and calibration.

Results: Among n = 371,006 patients with type 2 diabetes (mean age: 67.2 years), 3.5% (n = 13,030) had MIs and 3.4% (n = 12,701) strokes. AUPRCs were 0.035 (MI) and 0.034 (stroke) for a null model, between 0.082 (MLP) and 0.092 (GB) for MI, and between 0.061 (MLP) and 0.073 (GB) for stoke. AUROCs were 0.5 for null models, between 0.70 (RF, MLP, FTT) and 0.71 (all other models) for MI, and between 0.66 (MLP) and 0.69 (GB) for stroke. All models were well calibrated.

Conclusions: Discrimination performance of claims-based models reached a ceiling at around 0.09 AUPRC and 0.7 AUROC. While for AUROC this performance was comparable to existing epidemiological models incorporating clinical information, comparison of other, potentially more relevant metrics, such as AUPRC, sensitivity and Positive Predictive Value was hampered by lack of reporting in the literature. The fact that machine learning including deep learning methods did not outperform more traditional approaches may suggest that feature richness and complexity were exploited before the choice of algorithm could become critical to maximize performance. Future research might focus on the impact of different feature derivation approaches on performance ceilings. In the absence of other more powerful screening alternatives, applying transparent regression-based models in routine claims, though certainly imperfect, remains a promising scalable low-cost approach for population-based cardiovascular risk prediction and stratification.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Cardiovascular Diabetology
Cardiovascular Diabetology 医学-内分泌学与代谢
CiteScore
12.30
自引率
15.10%
发文量
240
审稿时长
1 months
期刊介绍: Cardiovascular Diabetology is a journal that welcomes manuscripts exploring various aspects of the relationship between diabetes, cardiovascular health, and the metabolic syndrome. We invite submissions related to clinical studies, genetic investigations, experimental research, pharmacological studies, epidemiological analyses, and molecular biology research in this field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信