{"title":"基于随机森林加权特征选择和XGBoost集成分类器的2型糖尿病风险预测模型","authors":"Zhongxian Xu, Zhiliang Wang","doi":"10.1109/ICACI.2019.8778622","DOIUrl":null,"url":null,"abstract":"Type 2 diabetes mellitus is a severe chronic disease threatening human health and has a high incidence worldwide. People need to use effective prediction model to diagnose and prevent diabetes in time. At present, data mining technology has become an increasingly important technology with classification capability in the field of medical diagnosis. This paper proposes a risk prediction model for type 2 diabetes based on ensemble learning method. In the proposed model, the weighted feature selection algorithm based on random forest (RF-WFS) is used for optimal feature selection, and extreme gradient boosting (XGBoost) classifier. The effectiveness of the method was validated by comparing the various performance metrics and the results of different contrast experiments. Additionally, we get a better prediction accuracy using the method than using the other classification algorithms (C4.5, Naive Bayes, AdaBoost, Random Forest). The validation results at UCI Pima Indian diabetes dataset shows that the model has better accuracy and classification performance than other research results mentioned in the literature. As a result, it has been proven that the model would be effective for the diagnosis of diabetes at the initial stage.","PeriodicalId":213368,"journal":{"name":"2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":"{\"title\":\"A Risk Prediction Model for Type 2 Diabetes Based on Weighted Feature Selection of Random Forest and XGBoost Ensemble Classifier\",\"authors\":\"Zhongxian Xu, Zhiliang Wang\",\"doi\":\"10.1109/ICACI.2019.8778622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Type 2 diabetes mellitus is a severe chronic disease threatening human health and has a high incidence worldwide. People need to use effective prediction model to diagnose and prevent diabetes in time. At present, data mining technology has become an increasingly important technology with classification capability in the field of medical diagnosis. This paper proposes a risk prediction model for type 2 diabetes based on ensemble learning method. In the proposed model, the weighted feature selection algorithm based on random forest (RF-WFS) is used for optimal feature selection, and extreme gradient boosting (XGBoost) classifier. The effectiveness of the method was validated by comparing the various performance metrics and the results of different contrast experiments. Additionally, we get a better prediction accuracy using the method than using the other classification algorithms (C4.5, Naive Bayes, AdaBoost, Random Forest). The validation results at UCI Pima Indian diabetes dataset shows that the model has better accuracy and classification performance than other research results mentioned in the literature. As a result, it has been proven that the model would be effective for the diagnosis of diabetes at the initial stage.\",\"PeriodicalId\":213368,\"journal\":{\"name\":\"2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"37\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACI.2019.8778622\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI.2019.8778622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Risk Prediction Model for Type 2 Diabetes Based on Weighted Feature Selection of Random Forest and XGBoost Ensemble Classifier
Type 2 diabetes mellitus is a severe chronic disease threatening human health and has a high incidence worldwide. People need to use effective prediction model to diagnose and prevent diabetes in time. At present, data mining technology has become an increasingly important technology with classification capability in the field of medical diagnosis. This paper proposes a risk prediction model for type 2 diabetes based on ensemble learning method. In the proposed model, the weighted feature selection algorithm based on random forest (RF-WFS) is used for optimal feature selection, and extreme gradient boosting (XGBoost) classifier. The effectiveness of the method was validated by comparing the various performance metrics and the results of different contrast experiments. Additionally, we get a better prediction accuracy using the method than using the other classification algorithms (C4.5, Naive Bayes, AdaBoost, Random Forest). The validation results at UCI Pima Indian diabetes dataset shows that the model has better accuracy and classification performance than other research results mentioned in the literature. As a result, it has been proven that the model would be effective for the diagnosis of diabetes at the initial stage.