Zhikui Tian, JiZhong Zhang, Yadong Fan, Xuan Sun, Dongjun Wang, XiaoFei Liu, GuoHui Lu, Hongwu Wang
{"title":"利用机器学习中医特征检测2型糖尿病周围神经病变:一项横断面研究。","authors":"Zhikui Tian, JiZhong Zhang, Yadong Fan, Xuan Sun, Dongjun Wang, XiaoFei Liu, GuoHui Lu, Hongwu Wang","doi":"10.1186/s12911-025-02932-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Diabetic peripheral neuropathy (DPN) is the most common complication of diabetes mellitus. Early identification of individuals at high risk of DPN is essential for successful early intervention. Traditional Chinese medicine (TCM) tongue diagnosis, one of the four diagnostic methods, lacks specific algorithms for TCM symptoms and tongue features. This study aims to develop machine learning (ML) models based on TCM to predict the risk of diabetic peripheral neuropathy (DPN) in patients with type 2 diabetes mellitus (T2DM).</p><p><strong>Methods: </strong>A total of 4723 patients were included in the analysis (4430 with T2DM and 293 with DPN). TFDA-1 was used to obtain tongue images during a questionnaire survey. LASSO (least absolute shrinkage and selection operator) logistic regression model with fivefold cross-validation was used to select imaging features, which were then screened using best subset selection. The synthetic minority oversampling technique (SMOTE) algorithm was applied to address the class imbalance and eliminate possible bias. The area under the receiver operating characteristic curve (AUC) was used to evaluate the model's performance. Four ML algorithms, namely logistic regression (LR), random forest (RF), support vector classifier (SVC), and light gradient boosting machine (LGBM), were used to build predictive models for DPN. The importance of covariates in DPN was ranked using classifiers with better performance.</p><p><strong>Results: </strong>The RF model performed the best, with an accuracy of 0.767, precision of 0.718, recall of 0.874, F-1 score of 0.789, and AUC of 0.77. With a value of 0.879, the LGBM model appeared to be the best regarding recall Age, sweating, dark red tongue, insomnia, and smoking were the five most significant RF features. Age, yellow coating, loose teeth, smoking, and insomnia were the five most significant features of the LGBM model.</p><p><strong>Conclusions: </strong>This cross-sectional study demonstrates that the RF and LGBM models can screen for high-risk DPN in T2DM patients using TCM symptoms and tongue features. The identified key TCM-related features, such as age, tongue coating, and other symptoms, may be advantageous in developing preventative measures for T2DM patients.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"90"},"PeriodicalIF":3.3000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11837659/pdf/","citationCount":"0","resultStr":"{\"title\":\"Diabetic peripheral neuropathy detection of type 2 diabetes using machine learning from TCM features: a cross-sectional study.\",\"authors\":\"Zhikui Tian, JiZhong Zhang, Yadong Fan, Xuan Sun, Dongjun Wang, XiaoFei Liu, GuoHui Lu, Hongwu Wang\",\"doi\":\"10.1186/s12911-025-02932-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aims: </strong>Diabetic peripheral neuropathy (DPN) is the most common complication of diabetes mellitus. Early identification of individuals at high risk of DPN is essential for successful early intervention. Traditional Chinese medicine (TCM) tongue diagnosis, one of the four diagnostic methods, lacks specific algorithms for TCM symptoms and tongue features. This study aims to develop machine learning (ML) models based on TCM to predict the risk of diabetic peripheral neuropathy (DPN) in patients with type 2 diabetes mellitus (T2DM).</p><p><strong>Methods: </strong>A total of 4723 patients were included in the analysis (4430 with T2DM and 293 with DPN). TFDA-1 was used to obtain tongue images during a questionnaire survey. LASSO (least absolute shrinkage and selection operator) logistic regression model with fivefold cross-validation was used to select imaging features, which were then screened using best subset selection. The synthetic minority oversampling technique (SMOTE) algorithm was applied to address the class imbalance and eliminate possible bias. The area under the receiver operating characteristic curve (AUC) was used to evaluate the model's performance. Four ML algorithms, namely logistic regression (LR), random forest (RF), support vector classifier (SVC), and light gradient boosting machine (LGBM), were used to build predictive models for DPN. The importance of covariates in DPN was ranked using classifiers with better performance.</p><p><strong>Results: </strong>The RF model performed the best, with an accuracy of 0.767, precision of 0.718, recall of 0.874, F-1 score of 0.789, and AUC of 0.77. With a value of 0.879, the LGBM model appeared to be the best regarding recall Age, sweating, dark red tongue, insomnia, and smoking were the five most significant RF features. Age, yellow coating, loose teeth, smoking, and insomnia were the five most significant features of the LGBM model.</p><p><strong>Conclusions: </strong>This cross-sectional study demonstrates that the RF and LGBM models can screen for high-risk DPN in T2DM patients using TCM symptoms and tongue features. The identified key TCM-related features, such as age, tongue coating, and other symptoms, may be advantageous in developing preventative measures for T2DM patients.</p>\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"25 1\",\"pages\":\"90\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11837659/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-025-02932-w\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02932-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Diabetic peripheral neuropathy detection of type 2 diabetes using machine learning from TCM features: a cross-sectional study.
Aims: Diabetic peripheral neuropathy (DPN) is the most common complication of diabetes mellitus. Early identification of individuals at high risk of DPN is essential for successful early intervention. Traditional Chinese medicine (TCM) tongue diagnosis, one of the four diagnostic methods, lacks specific algorithms for TCM symptoms and tongue features. This study aims to develop machine learning (ML) models based on TCM to predict the risk of diabetic peripheral neuropathy (DPN) in patients with type 2 diabetes mellitus (T2DM).
Methods: A total of 4723 patients were included in the analysis (4430 with T2DM and 293 with DPN). TFDA-1 was used to obtain tongue images during a questionnaire survey. LASSO (least absolute shrinkage and selection operator) logistic regression model with fivefold cross-validation was used to select imaging features, which were then screened using best subset selection. The synthetic minority oversampling technique (SMOTE) algorithm was applied to address the class imbalance and eliminate possible bias. The area under the receiver operating characteristic curve (AUC) was used to evaluate the model's performance. Four ML algorithms, namely logistic regression (LR), random forest (RF), support vector classifier (SVC), and light gradient boosting machine (LGBM), were used to build predictive models for DPN. The importance of covariates in DPN was ranked using classifiers with better performance.
Results: The RF model performed the best, with an accuracy of 0.767, precision of 0.718, recall of 0.874, F-1 score of 0.789, and AUC of 0.77. With a value of 0.879, the LGBM model appeared to be the best regarding recall Age, sweating, dark red tongue, insomnia, and smoking were the five most significant RF features. Age, yellow coating, loose teeth, smoking, and insomnia were the five most significant features of the LGBM model.
Conclusions: This cross-sectional study demonstrates that the RF and LGBM models can screen for high-risk DPN in T2DM patients using TCM symptoms and tongue features. The identified key TCM-related features, such as age, tongue coating, and other symptoms, may be advantageous in developing preventative measures for T2DM patients.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.