Erin M Tallon, David D Williams, Cintya Schweisberger, Colin Mullaney, Brent Lockee, Diana Ferro, Craig A Vandervelden, Mitchell S Barnes, Angelica Cristello Sarteau, Anna R Kahkoska, Susana R Patton, Sanjeev Mehta, Ryan McDonough, Marcus Lind, Leonard D'Avolio, Mark A Clements
{"title":"基于电子健康记录的机器学习模型预测青年1型糖尿病患者糖化血红蛋白90天变化:可行性和模型开发研究","authors":"Erin M Tallon, David D Williams, Cintya Schweisberger, Colin Mullaney, Brent Lockee, Diana Ferro, Craig A Vandervelden, Mitchell S Barnes, Angelica Cristello Sarteau, Anna R Kahkoska, Susana R Patton, Sanjeev Mehta, Ryan McDonough, Marcus Lind, Leonard D'Avolio, Mark A Clements","doi":"10.2196/69142","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Clinicians currently lack an effective means for identifying youth with type 1 diabetes (T1D) who are at risk for experiencing glycemic deterioration between diabetes clinic visits. As a result, their ability to identify youth who may optimally benefit from targeted interventions designed to address rising glycemic levels is limited. Although electronic health records (EHR)-based risk predictions have been used to forecast health outcomes in T1D, no study has investigated the potential for using EHR data to identify youth with T1D who will experience a clinically significant rise in glycated hemoglobin (HbA1c) ≥0.3% (approximately 3 mmol/mol) between diabetes clinic visits.</p><p><strong>Objective: </strong>We aimed to evaluate the feasibility of using routinely collected EHR data to develop a machine learning model to predict 90-day unit-change in HbA1c (in % units) in youth (aged 9-18 y) with T1D. We assessed our model's ability to augment clinical decision-making by identifying a percent change cut point that optimized identification of youth who would experience a clinically significant rise in HbA1c.</p><p><strong>Methods: </strong>From a cohort of 2757 youth with T1D who received care from a network of pediatric diabetes clinics in the Midwestern United States (January 2012-August 2017), we identified 1743 youth with 9643 HbA1c observation windows (ie, 2 HbA1c measurements separated by 70-110 d, approximating the 90-day time interval between routine diabetes clinic visits). We used up to 5 years of youths' longitudinal EHR data to transform 17,466 features (demographics, laboratory results, vital signs, anthropometric measures, medications, diagnosis codes, procedure codes, and free-text data) for model training. We performed 3-fold cross-validation to train random forest regression models to predict 90-day unit-change in HbA1c(%).</p><p><strong>Results: </strong>Across all 3 folds of our cross-validation model, the average root-mean-square error was 0.88 (95% CI 0.85-0.90). Predicted HbA1c(%) strongly correlated with true HbA1c(%) (r=0.79; 95% CI 0.78-0.80). The top 10 features impacting model predictions included postal code, various metrics related to HbA1c, and the frequency of a diagnosis code indicating difficulty with treatment engagement. At a clinically significant percent rise threshold of ≥0.3% (approximately 3 mmol/mol), our model's positive predictive value was 60.3%, indicating a 1.5-fold enrichment (relative to the observed frequency that youth experienced this outcome [3928/9643, 40.7%]). Model sensitivity and positive predictive value improved when thresholds for clinical significance included smaller changes in HbA1c, whereas specificity and negative predictive value improved when thresholds required larger changes in HbA1c.</p><p><strong>Conclusions: </strong>Routinely collected EHR data can be used to create an ML model for predicting unit-change in HbA1c between diabetes clinic visits among youth with T1D. Future work will focus on optimizing model performance and validating the model in additional cohorts and in other diabetes clinics.</p>","PeriodicalId":52371,"journal":{"name":"JMIR Diabetes","volume":"10 ","pages":"e69142"},"PeriodicalIF":2.6000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12463387/pdf/","citationCount":"0","resultStr":"{\"title\":\"Toward a Clinically Actionable, Electronic Health Record-Based Machine Learning Model to Forecast 90-Day Change in Hemoglobin A<sub>1c</sub> in Youth With Type 1 Diabetes: Feasibility and Model Development Study.\",\"authors\":\"Erin M Tallon, David D Williams, Cintya Schweisberger, Colin Mullaney, Brent Lockee, Diana Ferro, Craig A Vandervelden, Mitchell S Barnes, Angelica Cristello Sarteau, Anna R Kahkoska, Susana R Patton, Sanjeev Mehta, Ryan McDonough, Marcus Lind, Leonard D'Avolio, Mark A Clements\",\"doi\":\"10.2196/69142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Clinicians currently lack an effective means for identifying youth with type 1 diabetes (T1D) who are at risk for experiencing glycemic deterioration between diabetes clinic visits. As a result, their ability to identify youth who may optimally benefit from targeted interventions designed to address rising glycemic levels is limited. Although electronic health records (EHR)-based risk predictions have been used to forecast health outcomes in T1D, no study has investigated the potential for using EHR data to identify youth with T1D who will experience a clinically significant rise in glycated hemoglobin (HbA1c) ≥0.3% (approximately 3 mmol/mol) between diabetes clinic visits.</p><p><strong>Objective: </strong>We aimed to evaluate the feasibility of using routinely collected EHR data to develop a machine learning model to predict 90-day unit-change in HbA1c (in % units) in youth (aged 9-18 y) with T1D. We assessed our model's ability to augment clinical decision-making by identifying a percent change cut point that optimized identification of youth who would experience a clinically significant rise in HbA1c.</p><p><strong>Methods: </strong>From a cohort of 2757 youth with T1D who received care from a network of pediatric diabetes clinics in the Midwestern United States (January 2012-August 2017), we identified 1743 youth with 9643 HbA1c observation windows (ie, 2 HbA1c measurements separated by 70-110 d, approximating the 90-day time interval between routine diabetes clinic visits). We used up to 5 years of youths' longitudinal EHR data to transform 17,466 features (demographics, laboratory results, vital signs, anthropometric measures, medications, diagnosis codes, procedure codes, and free-text data) for model training. We performed 3-fold cross-validation to train random forest regression models to predict 90-day unit-change in HbA1c(%).</p><p><strong>Results: </strong>Across all 3 folds of our cross-validation model, the average root-mean-square error was 0.88 (95% CI 0.85-0.90). Predicted HbA1c(%) strongly correlated with true HbA1c(%) (r=0.79; 95% CI 0.78-0.80). The top 10 features impacting model predictions included postal code, various metrics related to HbA1c, and the frequency of a diagnosis code indicating difficulty with treatment engagement. At a clinically significant percent rise threshold of ≥0.3% (approximately 3 mmol/mol), our model's positive predictive value was 60.3%, indicating a 1.5-fold enrichment (relative to the observed frequency that youth experienced this outcome [3928/9643, 40.7%]). Model sensitivity and positive predictive value improved when thresholds for clinical significance included smaller changes in HbA1c, whereas specificity and negative predictive value improved when thresholds required larger changes in HbA1c.</p><p><strong>Conclusions: </strong>Routinely collected EHR data can be used to create an ML model for predicting unit-change in HbA1c between diabetes clinic visits among youth with T1D. Future work will focus on optimizing model performance and validating the model in additional cohorts and in other diabetes clinics.</p>\",\"PeriodicalId\":52371,\"journal\":{\"name\":\"JMIR Diabetes\",\"volume\":\"10 \",\"pages\":\"e69142\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12463387/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Diabetes\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/69142\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Diabetes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/69142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
摘要
背景:临床医生目前缺乏一种有效的方法来识别青少年1型糖尿病(T1D)患者,他们在糖尿病门诊就诊期间有经历血糖恶化的风险。因此,他们确定哪些年轻人可能从旨在解决血糖水平上升的针对性干预措施中获益的能力是有限的。尽管基于电子健康记录(EHR)的风险预测已被用于预测T1D的健康结果,但尚未有研究调查使用EHR数据识别糖尿病门诊就诊期间糖化血红蛋白(HbA1c)临床显著升高≥0.3%(约3 mmol/mol)的T1D青年的潜力。目的:我们旨在评估使用常规收集的EHR数据来开发机器学习模型的可行性,以预测青年(9-18岁)T1D患者HbA1c 90天单位变化(单位百分比)。我们通过确定一个百分比变化切点来评估我们的模型增强临床决策的能力,该切点优化了对将经历HbA1c临床显著升高的年轻人的识别。方法:从2757名接受美国中西部儿童糖尿病诊所网络治疗的T1D青年队列(2012年1月至2017年8月)中,我们确定了1743名青少年,有9643个HbA1c观察窗口(即,2次HbA1c测量间隔70-110天,接近常规糖尿病诊所就诊间隔90天)。我们使用长达5年的青少年纵向EHR数据来转换17,466个特征(人口统计、实验室结果、生命体征、人体测量、药物、诊断代码、程序代码和自由文本数据)用于模型训练。我们进行了3次交叉验证,以训练随机森林回归模型来预测90天HbA1c的单位变化(%)。结果:在交叉验证模型的所有3个折叠中,平均均方根误差为0.88 (95% CI 0.85-0.90)。预测HbA1c(%)与真实HbA1c(%)密切相关(r=0.79; 95% CI 0.78-0.80)。影响模型预测的前10个特征包括邮政编码、与糖化血红蛋白相关的各种指标,以及表明治疗参与困难的诊断代码的频率。在临床显著的百分比上升阈值≥0.3%(约3 mmol/mol)时,我们的模型的阳性预测值为60.3%,表明1.5倍的富集(相对于观察到的年轻人经历这种结果的频率[3928/9643,40.7%])。当临床意义阈值包括较小的HbA1c变化时,模型敏感性和阳性预测值提高,而当阈值要求较大的HbA1c变化时,模型特异性和阴性预测值提高。结论:常规收集的EHR数据可用于创建ML模型,预测青年T1D糖尿病患者就诊期间HbA1c的单位变化。未来的工作将侧重于优化模型的性能,并在其他队列和其他糖尿病诊所验证该模型。
Toward a Clinically Actionable, Electronic Health Record-Based Machine Learning Model to Forecast 90-Day Change in Hemoglobin A1c in Youth With Type 1 Diabetes: Feasibility and Model Development Study.
Background: Clinicians currently lack an effective means for identifying youth with type 1 diabetes (T1D) who are at risk for experiencing glycemic deterioration between diabetes clinic visits. As a result, their ability to identify youth who may optimally benefit from targeted interventions designed to address rising glycemic levels is limited. Although electronic health records (EHR)-based risk predictions have been used to forecast health outcomes in T1D, no study has investigated the potential for using EHR data to identify youth with T1D who will experience a clinically significant rise in glycated hemoglobin (HbA1c) ≥0.3% (approximately 3 mmol/mol) between diabetes clinic visits.
Objective: We aimed to evaluate the feasibility of using routinely collected EHR data to develop a machine learning model to predict 90-day unit-change in HbA1c (in % units) in youth (aged 9-18 y) with T1D. We assessed our model's ability to augment clinical decision-making by identifying a percent change cut point that optimized identification of youth who would experience a clinically significant rise in HbA1c.
Methods: From a cohort of 2757 youth with T1D who received care from a network of pediatric diabetes clinics in the Midwestern United States (January 2012-August 2017), we identified 1743 youth with 9643 HbA1c observation windows (ie, 2 HbA1c measurements separated by 70-110 d, approximating the 90-day time interval between routine diabetes clinic visits). We used up to 5 years of youths' longitudinal EHR data to transform 17,466 features (demographics, laboratory results, vital signs, anthropometric measures, medications, diagnosis codes, procedure codes, and free-text data) for model training. We performed 3-fold cross-validation to train random forest regression models to predict 90-day unit-change in HbA1c(%).
Results: Across all 3 folds of our cross-validation model, the average root-mean-square error was 0.88 (95% CI 0.85-0.90). Predicted HbA1c(%) strongly correlated with true HbA1c(%) (r=0.79; 95% CI 0.78-0.80). The top 10 features impacting model predictions included postal code, various metrics related to HbA1c, and the frequency of a diagnosis code indicating difficulty with treatment engagement. At a clinically significant percent rise threshold of ≥0.3% (approximately 3 mmol/mol), our model's positive predictive value was 60.3%, indicating a 1.5-fold enrichment (relative to the observed frequency that youth experienced this outcome [3928/9643, 40.7%]). Model sensitivity and positive predictive value improved when thresholds for clinical significance included smaller changes in HbA1c, whereas specificity and negative predictive value improved when thresholds required larger changes in HbA1c.
Conclusions: Routinely collected EHR data can be used to create an ML model for predicting unit-change in HbA1c between diabetes clinic visits among youth with T1D. Future work will focus on optimizing model performance and validating the model in additional cohorts and in other diabetes clinics.