Casey Choong PhD, Neena Xavier MD, Beverly Falcon PhD, Hong Kan PhD, Ilya Lipkovich PhD, Callie Nowak MPH, Margaret Hoyt PhD, Christy Houle PhD, Scott Kahan MD
{"title":"在美国的电子医疗记录中使用机器学习识别有体重增加风险的个体。","authors":"Casey Choong PhD, Neena Xavier MD, Beverly Falcon PhD, Hong Kan PhD, Ilya Lipkovich PhD, Callie Nowak MPH, Margaret Hoyt PhD, Christy Houle PhD, Scott Kahan MD","doi":"10.1111/dom.16311","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aims</h3>\n \n <p>Numerous risk factors for the development of obesity have been identified, yet the aetiology is not well understood. Traditional statistical methods for analysing observational data are limited by the volume and characteristics of large datasets. Machine learning (ML) methods can analyse large datasets to extract novel insights on risk factors for obesity. This study predicted adults at risk of a ≥10% increase in index body mass index (BMI) within 12 months using ML and a large electronic medical records (EMR) database.</p>\n </section>\n \n <section>\n \n <h3> Materials and Methods</h3>\n \n <p>ML algorithms were used with EMR from Optum's de-identified Market Clarity Data, a US database. Models included extreme gradient boosting (XGBoost), random forest, simple logistic regression (no feature selection procedure) and two penalised logistic models (Elastic Net and Least Absolute Shrinkage and Selection Operator [LASSO]). Performance metrics included the area under the curve (AUC) of the receiver operating characteristic curve (used to determine the best-performing model), average precision, Brier score, accuracy, recall, positive predictive value, Youden index, F1 score, negative predictive value and specificity.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The XGBoost model performed best 12 months post-index, with an AUC of 0.75. Lower baseline BMI, having any emergency room visit during the study period, no diabetes mellitus, no lipid disorders and younger age were among the top predictors for ≥10% increase in index BMI.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>The current study demonstrates an ML approach applied to EMR to identify those at risk for weight gain over 12 months. Providers may use this risk stratification to prioritise prevention strategies or earlier obesity intervention.</p>\n </section>\n </div>","PeriodicalId":158,"journal":{"name":"Diabetes, Obesity & Metabolism","volume":"27 6","pages":"3061-3071"},"PeriodicalIF":5.4000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/dom.16311","citationCount":"0","resultStr":"{\"title\":\"Identifying individuals at risk for weight gain using machine learning in electronic medical records from the United States\",\"authors\":\"Casey Choong PhD, Neena Xavier MD, Beverly Falcon PhD, Hong Kan PhD, Ilya Lipkovich PhD, Callie Nowak MPH, Margaret Hoyt PhD, Christy Houle PhD, Scott Kahan MD\",\"doi\":\"10.1111/dom.16311\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Aims</h3>\\n \\n <p>Numerous risk factors for the development of obesity have been identified, yet the aetiology is not well understood. Traditional statistical methods for analysing observational data are limited by the volume and characteristics of large datasets. Machine learning (ML) methods can analyse large datasets to extract novel insights on risk factors for obesity. This study predicted adults at risk of a ≥10% increase in index body mass index (BMI) within 12 months using ML and a large electronic medical records (EMR) database.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Materials and Methods</h3>\\n \\n <p>ML algorithms were used with EMR from Optum's de-identified Market Clarity Data, a US database. Models included extreme gradient boosting (XGBoost), random forest, simple logistic regression (no feature selection procedure) and two penalised logistic models (Elastic Net and Least Absolute Shrinkage and Selection Operator [LASSO]). Performance metrics included the area under the curve (AUC) of the receiver operating characteristic curve (used to determine the best-performing model), average precision, Brier score, accuracy, recall, positive predictive value, Youden index, F1 score, negative predictive value and specificity.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>The XGBoost model performed best 12 months post-index, with an AUC of 0.75. Lower baseline BMI, having any emergency room visit during the study period, no diabetes mellitus, no lipid disorders and younger age were among the top predictors for ≥10% increase in index BMI.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>The current study demonstrates an ML approach applied to EMR to identify those at risk for weight gain over 12 months. Providers may use this risk stratification to prioritise prevention strategies or earlier obesity intervention.</p>\\n </section>\\n </div>\",\"PeriodicalId\":158,\"journal\":{\"name\":\"Diabetes, Obesity & Metabolism\",\"volume\":\"27 6\",\"pages\":\"3061-3071\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-03-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/dom.16311\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diabetes, Obesity & Metabolism\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/dom.16311\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENDOCRINOLOGY & METABOLISM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetes, Obesity & Metabolism","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/dom.16311","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
Identifying individuals at risk for weight gain using machine learning in electronic medical records from the United States
Aims
Numerous risk factors for the development of obesity have been identified, yet the aetiology is not well understood. Traditional statistical methods for analysing observational data are limited by the volume and characteristics of large datasets. Machine learning (ML) methods can analyse large datasets to extract novel insights on risk factors for obesity. This study predicted adults at risk of a ≥10% increase in index body mass index (BMI) within 12 months using ML and a large electronic medical records (EMR) database.
Materials and Methods
ML algorithms were used with EMR from Optum's de-identified Market Clarity Data, a US database. Models included extreme gradient boosting (XGBoost), random forest, simple logistic regression (no feature selection procedure) and two penalised logistic models (Elastic Net and Least Absolute Shrinkage and Selection Operator [LASSO]). Performance metrics included the area under the curve (AUC) of the receiver operating characteristic curve (used to determine the best-performing model), average precision, Brier score, accuracy, recall, positive predictive value, Youden index, F1 score, negative predictive value and specificity.
Results
The XGBoost model performed best 12 months post-index, with an AUC of 0.75. Lower baseline BMI, having any emergency room visit during the study period, no diabetes mellitus, no lipid disorders and younger age were among the top predictors for ≥10% increase in index BMI.
Conclusion
The current study demonstrates an ML approach applied to EMR to identify those at risk for weight gain over 12 months. Providers may use this risk stratification to prioritise prevention strategies or earlier obesity intervention.
期刊介绍:
Diabetes, Obesity and Metabolism is primarily a journal of clinical and experimental pharmacology and therapeutics covering the interrelated areas of diabetes, obesity and metabolism. The journal prioritises high-quality original research that reports on the effects of new or existing therapies, including dietary, exercise and lifestyle (non-pharmacological) interventions, in any aspect of metabolic and endocrine disease, either in humans or animal and cellular systems. ‘Metabolism’ may relate to lipids, bone and drug metabolism, or broader aspects of endocrine dysfunction. Preclinical pharmacology, pharmacokinetic studies, meta-analyses and those addressing drug safety and tolerability are also highly suitable for publication in this journal. Original research may be published as a main paper or as a research letter.