Development and validation of risk prediction models for large for gestational age infants using logistic regression and two machine learning algorithms 使用Logistic回归和两种机器学习算法开发和验证大于胎龄儿风险预测模型
Ning Wang, Haonan Guo, Yingyu Jing, Yifan Zhang, Bo Sun, Xingyan Pan, Huan Chen, Jing Xu, Mengjun Wang, Xi Chen, Lin Song, Wei Cui
{"title":"Development and validation of risk prediction models for large for gestational age infants using logistic regression and two machine learning algorithms 使用Logistic回归和两种机器学习算法开发和验证大于胎龄儿风险预测模型","authors":"Ning Wang, Haonan Guo, Yingyu Jing, Yifan Zhang, Bo Sun, Xingyan Pan, Huan Chen, Jing Xu, Mengjun Wang, Xi Chen, Lin Song, Wei Cui","doi":"10.1111/1753-0407.13375","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Large for gestational age (LGA) is one of the adverse outcomes during pregnancy that endangers the life and health of mothers and offspring. We aimed to establish prediction models for LGA at late pregnancy.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Data were obtained from an established Chinese pregnant women cohort of 1285 pregnant women. LGA was diagnosed as >90th percentile of birth weight distribution of Chinese corresponding to gestational age of the same-sex newborns. Women with gestational diabetes mellitus (GDM) were classified into three subtypes according to the indexes of insulin sensitivity and insulin secretion. Models were established by logistic regression and decision tree/random forest algorithms, and validated by the data.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>A total of 139 newborns were diagnosed as LGA after birth. The area under the curve (AUC) for the training set is 0.760 (95% confidence interval [CI] 0.706–0.815), and 0.748 (95% CI 0.659–0.837) for the internal validation set of the logistic regression model, which consisted of eight commonly used clinical indicators (including lipid profile) and GDM subtypes. For the prediction models established by the two machine learning algorithms, which included all the variables, the training set and the internal validation set had AUCs of 0.813 (95% CI 0.786–0.839) and 0.779 (95% CI 0.735–0.824) for the decision tree model, and 0.854 (95% CI 0.831–0.877) and 0.808 (95% CI 0.766–0.850) for the random forest model.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>We established and validated three LGA risk prediction models to screen out the pregnant women with high risk of LGA at the early stage of the third trimester, which showed good prediction power and could guide early prevention strategies.</p>\n </section>\n </div>","PeriodicalId":189,"journal":{"name":"Journal of Diabetes","volume":"15 4","pages":"338-348"},"PeriodicalIF":3.0000,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1753-0407.13375","citationCount":"1","resultStr":"{\"title\":\"Development and validation of risk prediction models for large for gestational age infants using logistic regression and two machine learning algorithms\\n 使用Logistic回归和两种机器学习算法开发和验证大于胎龄儿风险预测模型\",\"authors\":\"Ning Wang, Haonan Guo, Yingyu Jing, Yifan Zhang, Bo Sun, Xingyan Pan, Huan Chen, Jing Xu, Mengjun Wang, Xi Chen, Lin Song, Wei Cui\",\"doi\":\"10.1111/1753-0407.13375\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Large for gestational age (LGA) is one of the adverse outcomes during pregnancy that endangers the life and health of mothers and offspring. We aimed to establish prediction models for LGA at late pregnancy.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Data were obtained from an established Chinese pregnant women cohort of 1285 pregnant women. LGA was diagnosed as >90th percentile of birth weight distribution of Chinese corresponding to gestational age of the same-sex newborns. Women with gestational diabetes mellitus (GDM) were classified into three subtypes according to the indexes of insulin sensitivity and insulin secretion. Models were established by logistic regression and decision tree/random forest algorithms, and validated by the data.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>A total of 139 newborns were diagnosed as LGA after birth. The area under the curve (AUC) for the training set is 0.760 (95% confidence interval [CI] 0.706–0.815), and 0.748 (95% CI 0.659–0.837) for the internal validation set of the logistic regression model, which consisted of eight commonly used clinical indicators (including lipid profile) and GDM subtypes. For the prediction models established by the two machine learning algorithms, which included all the variables, the training set and the internal validation set had AUCs of 0.813 (95% CI 0.786–0.839) and 0.779 (95% CI 0.735–0.824) for the decision tree model, and 0.854 (95% CI 0.831–0.877) and 0.808 (95% CI 0.766–0.850) for the random forest model.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>We established and validated three LGA risk prediction models to screen out the pregnant women with high risk of LGA at the early stage of the third trimester, which showed good prediction power and could guide early prevention strategies.</p>\\n </section>\\n </div>\",\"PeriodicalId\":189,\"journal\":{\"name\":\"Journal of Diabetes\",\"volume\":\"15 4\",\"pages\":\"338-348\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1753-0407.13375\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Diabetes\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/1753-0407.13375\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENDOCRINOLOGY & METABOLISM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Diabetes","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1753-0407.13375","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 1
摘要
大胎龄(Large for gestational age, LGA)是妊娠期危害母亲和后代生命健康的不良后果之一。我们的目的是建立妊娠晚期LGA的预测模型。方法从1285名已建立的中国孕妇队列中获取数据。LGA诊断为同性新生儿在中国出生体重分布中与胎龄相对应的第90百分位。根据胰岛素敏感性和胰岛素分泌指标将妊娠期糖尿病(GDM)分为3个亚型。采用logistic回归和决策树/随机森林算法建立模型,并进行数据验证。结果139例新生儿出生后诊断为LGA。训练集的曲线下面积(AUC)为0.760(95%可信区间[CI] 0.706-0.815),由8个常用临床指标(包括血脂)和GDM亚型组成的logistic回归模型内部验证集的AUC为0.748 (95% CI 0.659-0.837)。对于包含所有变量的两种机器学习算法建立的预测模型,决策树模型的训练集和内部验证集的auc分别为0.813 (95% CI 0.786-0.839)和0.779 (95% CI 0.735-0.824),随机森林模型的auc分别为0.854 (95% CI 0.831-0.877)和0.808 (95% CI 0.766-0.850)。结论建立并验证了3种LGA风险预测模型,可在妊娠晚期早期筛选出LGA高危孕妇,具有较好的预测能力,可指导早期预防策略。
Development and validation of risk prediction models for large for gestational age infants using logistic regression and two machine learning algorithms
使用Logistic回归和两种机器学习算法开发和验证大于胎龄儿风险预测模型
Background
Large for gestational age (LGA) is one of the adverse outcomes during pregnancy that endangers the life and health of mothers and offspring. We aimed to establish prediction models for LGA at late pregnancy.
Methods
Data were obtained from an established Chinese pregnant women cohort of 1285 pregnant women. LGA was diagnosed as >90th percentile of birth weight distribution of Chinese corresponding to gestational age of the same-sex newborns. Women with gestational diabetes mellitus (GDM) were classified into three subtypes according to the indexes of insulin sensitivity and insulin secretion. Models were established by logistic regression and decision tree/random forest algorithms, and validated by the data.
Results
A total of 139 newborns were diagnosed as LGA after birth. The area under the curve (AUC) for the training set is 0.760 (95% confidence interval [CI] 0.706–0.815), and 0.748 (95% CI 0.659–0.837) for the internal validation set of the logistic regression model, which consisted of eight commonly used clinical indicators (including lipid profile) and GDM subtypes. For the prediction models established by the two machine learning algorithms, which included all the variables, the training set and the internal validation set had AUCs of 0.813 (95% CI 0.786–0.839) and 0.779 (95% CI 0.735–0.824) for the decision tree model, and 0.854 (95% CI 0.831–0.877) and 0.808 (95% CI 0.766–0.850) for the random forest model.
Conclusion
We established and validated three LGA risk prediction models to screen out the pregnant women with high risk of LGA at the early stage of the third trimester, which showed good prediction power and could guide early prevention strategies.
期刊介绍:
Journal of Diabetes (JDB) devotes itself to diabetes research, therapeutics, and education. It aims to involve researchers and practitioners in a dialogue between East and West via all aspects of epidemiology, etiology, pathogenesis, management, complications and prevention of diabetes, including the molecular, biochemical, and physiological aspects of diabetes. The Editorial team is international with a unique mix of Asian and Western participation.
The Editors welcome submissions in form of original research articles, images, novel case reports and correspondence, and will solicit reviews, point-counterpoint, commentaries, editorials, news highlights, and educational content.