Nurjahan, Mohammad Abu Tareq Rony, Md. Shahriare Satu, Md. Whaiduzzaman
{"title":"利用各种分类方法挖掘糖尿病的重要特征","authors":"Nurjahan, Mohammad Abu Tareq Rony, Md. Shahriare Satu, Md. Whaiduzzaman","doi":"10.1109/ICICT4SD50815.2021.9397006","DOIUrl":null,"url":null,"abstract":"Diabetes is a chronic disease that occurs when blood glucose becomes very high. It is responsible for a number of serious complications in an affected patients body. However, early detection of this harmful disease can reduce critical situations like death as well as minimize the chance of losing valuable organs due to this disease. The aim of this study is to construct a predictive model through examining several machine learning techniques namely Decision tree, K Nearest Neighbour, Naive Bayes, Support Vector Machine, Logistic Regression, extreme Gradient Boosting, Multi-Layer Perceptron and Random Forest on two different datasets of diabetes patients namely Pima Indian diabetes datasets and Sylhet Diabetes Hospital datasets. Several popular and effective feature subset selection procedures have also been utilized for eliminating unnecessary attributes. After analyzing the outputs of the work, it is seen that Random Forest delivers the highest accuracy (97.5%), F-measure (97.5%), Area under Receiver Operating Characteristic Curve (99.80%) for the Gain Ratio Attribute Evaluation feature subset selection technique in case of Sylhet hospital datasets. On the other hand, in case of Pima Indian datasets, Logistic Regression delivers the highest accuracy (77.7%), F-measure (77%) for Information Gain Attribute Evaluation and Area under Receiver Operating Curve (83%) for both of the techniques namely Correlation-based Feature Selection Subset Evaluation and Correlation Attribute Evaluation. However, In this study, 10 fold cross validation technique has been used for the performance measurement.","PeriodicalId":239251,"journal":{"name":"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Mining Significant Features of Diabetes through Employing Various Classification Methods\",\"authors\":\"Nurjahan, Mohammad Abu Tareq Rony, Md. Shahriare Satu, Md. Whaiduzzaman\",\"doi\":\"10.1109/ICICT4SD50815.2021.9397006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetes is a chronic disease that occurs when blood glucose becomes very high. It is responsible for a number of serious complications in an affected patients body. However, early detection of this harmful disease can reduce critical situations like death as well as minimize the chance of losing valuable organs due to this disease. The aim of this study is to construct a predictive model through examining several machine learning techniques namely Decision tree, K Nearest Neighbour, Naive Bayes, Support Vector Machine, Logistic Regression, extreme Gradient Boosting, Multi-Layer Perceptron and Random Forest on two different datasets of diabetes patients namely Pima Indian diabetes datasets and Sylhet Diabetes Hospital datasets. Several popular and effective feature subset selection procedures have also been utilized for eliminating unnecessary attributes. After analyzing the outputs of the work, it is seen that Random Forest delivers the highest accuracy (97.5%), F-measure (97.5%), Area under Receiver Operating Characteristic Curve (99.80%) for the Gain Ratio Attribute Evaluation feature subset selection technique in case of Sylhet hospital datasets. On the other hand, in case of Pima Indian datasets, Logistic Regression delivers the highest accuracy (77.7%), F-measure (77%) for Information Gain Attribute Evaluation and Area under Receiver Operating Curve (83%) for both of the techniques namely Correlation-based Feature Selection Subset Evaluation and Correlation Attribute Evaluation. However, In this study, 10 fold cross validation technique has been used for the performance measurement.\",\"PeriodicalId\":239251,\"journal\":{\"name\":\"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICT4SD50815.2021.9397006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT4SD50815.2021.9397006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mining Significant Features of Diabetes through Employing Various Classification Methods
Diabetes is a chronic disease that occurs when blood glucose becomes very high. It is responsible for a number of serious complications in an affected patients body. However, early detection of this harmful disease can reduce critical situations like death as well as minimize the chance of losing valuable organs due to this disease. The aim of this study is to construct a predictive model through examining several machine learning techniques namely Decision tree, K Nearest Neighbour, Naive Bayes, Support Vector Machine, Logistic Regression, extreme Gradient Boosting, Multi-Layer Perceptron and Random Forest on two different datasets of diabetes patients namely Pima Indian diabetes datasets and Sylhet Diabetes Hospital datasets. Several popular and effective feature subset selection procedures have also been utilized for eliminating unnecessary attributes. After analyzing the outputs of the work, it is seen that Random Forest delivers the highest accuracy (97.5%), F-measure (97.5%), Area under Receiver Operating Characteristic Curve (99.80%) for the Gain Ratio Attribute Evaluation feature subset selection technique in case of Sylhet hospital datasets. On the other hand, in case of Pima Indian datasets, Logistic Regression delivers the highest accuracy (77.7%), F-measure (77%) for Information Gain Attribute Evaluation and Area under Receiver Operating Curve (83%) for both of the techniques namely Correlation-based Feature Selection Subset Evaluation and Correlation Attribute Evaluation. However, In this study, 10 fold cross validation technique has been used for the performance measurement.