Explainable Prediction of Long-Term Glycated Hemoglobin Response Change in Finnish Patients with Type 2 Diabetes Following Drug Initiation Using Evidence-Based Machine Learning Approaches.

IF 3.2 2区医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Clinical Epidemiology Pub Date : 2025-03-08 eCollection Date: 2025-01-01 DOI:10.2147/CLEP.S505966

Gunjan Chandra, Piia Lavikainen, Pekka Siirtola, Satu Tamminen, Anusha Ihalapathirana, Tiina Laatikainen, Janne Martikainen, Juha Röning

{"title":"Explainable Prediction of Long-Term Glycated Hemoglobin Response Change in Finnish Patients with Type 2 Diabetes Following Drug Initiation Using Evidence-Based Machine Learning Approaches.","authors":"Gunjan Chandra, Piia Lavikainen, Pekka Siirtola, Satu Tamminen, Anusha Ihalapathirana, Tiina Laatikainen, Janne Martikainen, Juha Röning","doi":"10.2147/CLEP.S505966","DOIUrl":null,"url":null,"abstract":"Purpose: This study applied machine learning (ML) and explainable artificial intelligence (XAI) to predict changes in HbA1c levels, a critical biomarker for monitoring glycemic control, within 12 months of initiating a new antidiabetic drug in patients diagnosed with type 2 diabetes. It also aimed to identify the predictors associated with these changes.Patients and methods: Electronic health records (EHR) from 10,139 type 2 diabetes patients in North Karelia, Finland, were used to train models integrating randomized controlled trial (RCT)-derived HbA1c change values as predictors, creating offset models that integrate RCT insights with real-world data. Various ML models-including linear regression (LR), multi-layer perceptron (MLP), ridge regression (RR), random forest (RF), and XGBoost (XGB)-were evaluated using R² and RMSE metrics. Baseline models used data at or before drug initiation, while follow-up models included the first post-drug HbA1c measurement, improving performance by incorporating dynamic patient data. Model performance was also compared to expected HbA1c changes from clinical trials.Results: Results showed that ML models outperform RCT model, while LR, MLP, and RR models had comparable performance, RF and XGB models exhibited overfitting. The follow-up MLP model outperformed the baseline MLP model, with higher R² scores (0.74, 0.65) and lower RMSE values (6.94, 7.62), compared to the baseline model (R²: 0.52, 0.54; RMSE: 9.27, 9.50). Key predictors of HbA1c change included baseline and post-drug initiation HbA1c values, fasting plasma glucose, and HDL cholesterol.Conclusion: Using EHR and ML models allows for the development of more realistic and individualized predictions of HbA1c changes, accounting for more diverse patient populations and their heterogeneous nature, offering more tailored and effective treatment strategies for managing T2D. The use of XAI provided insights into the influence of specific predictors, enhancing model interpretability and clinical relevance. Future research will explore treatment selection models.","PeriodicalId":10362,"journal":{"name":"Clinical Epidemiology","volume":"17 ","pages":"225-240"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11899941/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/CLEP.S505966","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: This study applied machine learning (ML) and explainable artificial intelligence (XAI) to predict changes in HbA1c levels, a critical biomarker for monitoring glycemic control, within 12 months of initiating a new antidiabetic drug in patients diagnosed with type 2 diabetes. It also aimed to identify the predictors associated with these changes.

Patients and methods: Electronic health records (EHR) from 10,139 type 2 diabetes patients in North Karelia, Finland, were used to train models integrating randomized controlled trial (RCT)-derived HbA1c change values as predictors, creating offset models that integrate RCT insights with real-world data. Various ML models-including linear regression (LR), multi-layer perceptron (MLP), ridge regression (RR), random forest (RF), and XGBoost (XGB)-were evaluated using R² and RMSE metrics. Baseline models used data at or before drug initiation, while follow-up models included the first post-drug HbA1c measurement, improving performance by incorporating dynamic patient data. Model performance was also compared to expected HbA1c changes from clinical trials.

Results: Results showed that ML models outperform RCT model, while LR, MLP, and RR models had comparable performance, RF and XGB models exhibited overfitting. The follow-up MLP model outperformed the baseline MLP model, with higher R² scores (0.74, 0.65) and lower RMSE values (6.94, 7.62), compared to the baseline model (R²: 0.52, 0.54; RMSE: 9.27, 9.50). Key predictors of HbA1c change included baseline and post-drug initiation HbA1c values, fasting plasma glucose, and HDL cholesterol.

Conclusion: Using EHR and ML models allows for the development of more realistic and individualized predictions of HbA1c changes, accounting for more diverse patient populations and their heterogeneous nature, offering more tailored and effective treatment strategies for managing T2D. The use of XAI provided insights into the influence of specific predictors, enhancing model interpretability and clinical relevance. Future research will explore treatment selection models.

Abstract Image

查看原文本刊更多论文

基于证据的机器学习方法对芬兰2型糖尿病患者服药后长期糖化血红蛋白反应变化的可解释预测

目的：本研究应用机器学习（ML）和可解释人工智能（XAI）来预测诊断为2型糖尿病患者开始使用新的降糖药后12个月内HbA1c水平的变化，HbA1c水平是监测血糖控制的关键生物标志物。它还旨在确定与这些变化相关的预测因素。患者和方法：来自芬兰北卡累利阿10,139例2型糖尿病患者的电子健康记录（EHR）用于训练模型，将随机对照试验（RCT）衍生的HbA1c变化值作为预测因子，创建将RCT分析与现实数据相结合的偏移模型。各种ML模型——包括线性回归（LR）、多层感知器（MLP）、脊回归（RR）、随机森林（RF）和XGBoost （XGB）——使用R²和RMSE指标进行评估。基线模型使用药物开始时或之前的数据，而随访模型包括药物后的第一次HbA1c测量，通过纳入动态患者数据来提高性能。模型性能也与临床试验的预期HbA1c变化进行了比较。结果：ML模型优于RCT模型，LR、MLP和RR模型性能相当，RF和XGB模型存在过拟合现象。随访MLP模型优于基线MLP模型，与基线模型相比，R²得分较高（0.74,0.65），RMSE值较低（6.94,7.62）(R²：0.52,0.54；Rmse: 9.27, 9.50)。HbA1c变化的关键预测因素包括基线和服药后HbA1c值、空腹血糖和高密度脂蛋白胆固醇。结论：使用EHR和ML模型可以对HbA1c变化进行更现实和个性化的预测，考虑到更多样化的患者群体及其异质性，为管理T2D提供更定制和有效的治疗策略。使用XAI可以深入了解特定预测因子的影响，提高模型的可解释性和临床相关性。未来的研究将探索治疗选择模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Clinical Epidemiology Medicine-Epidemiology

CiteScore

6.30

自引率

5.10%

发文量

169

审稿时长

16 weeks

期刊介绍： Clinical Epidemiology is an international, peer reviewed, open access journal. Clinical Epidemiology focuses on the application of epidemiological principles and questions relating to patients and clinical care in terms of prevention, diagnosis, prognosis, and treatment. Clinical Epidemiology welcomes papers covering these topics in form of original research and systematic reviews. Clinical Epidemiology has a special interest in international electronic medical patient records and other routine health care data, especially as applied to safety of medical interventions, clinical utility of diagnostic procedures, understanding short- and long-term clinical course of diseases, clinical epidemiological and biostatistical methods, and systematic reviews. When considering submission of a paper utilizing publicly-available data, authors should ensure that such studies add significantly to the body of knowledge and that they use appropriate validated methods for identifying health outcomes. The journal has launched special series describing existing data sources for clinical epidemiology, international health care systems and validation studies of algorithms based on databases and registries.