基于EDAS的糖尿病检测机器学习算法选择

2020 9th International Conference System Modeling and Advancement in Research Trends (SMART) Pub Date : 2020-12-04 DOI:10.1109/SMART50582.2020.9337118

S. Sharma, Bhavya Sharma

{"title":"基于EDAS的糖尿病检测机器学习算法选择","authors":"S. Sharma, Bhavya Sharma","doi":"10.1109/SMART50582.2020.9337118","DOIUrl":null,"url":null,"abstract":"Diabetes is a matter of concern for the health of the entire world, its diagnosis and cure are among the prime challenge for the medical fraternity, because it can be controlled but can't be cured, sooner the diagnosis the better it will be for the patient. Thus, use of machine learning for timely classification, of diabetes plays a vital role to protect patient from the life threatening complications in future. Various classification techniques are available in Machine Learning (ML) viz. Support Vector Machines (SVM), Random Forest, Naïve Bayes Classifier, Linear Regression (LR), K-Nearest Neighbor(KNN) algorithm, etc. etc. But the question is which of the classification techniques, timely and accurately identifies this sensitive disorder. While predicting Diabetes using any machine learning algorithm, the accuracy, specificity and sensitivity, are some of the important parameters. The strengthening of these parameters requires the understanding of dataset under consideration, i.e. whether the data set is having some missing values or outliers, if missing values exists then to strengthen the prediction accuracy; one has to apply the data imputation techniques on the dataset. In the performed work the well-known dataset (Pima Indian) from UCI repository, was subject to data imputation techniques to handle the missing values present in it(tabulated in Table-1). Thereafter the said Machine Learning techniques were applied, and compared on the basis of various parameters viz. Accuracy, Sensitivity, and Specificity etc., to choose the best among algorithm one has to compare the multiple criteria's altogether, which is quite challenging. Thus, in the performed work the Evaluation Based on Distance from Average Solution (EDAS) is applied, it is a technique of Multi-Criteria-Decision-Making (MCDM). By applying EDAS over the performance evaluation statistics (speed, accuracy, specificity and sensitivity) of various classification algorithms viz. Naïve Bayes (NB) Classifier, Support Vector Machines (SVM),K-Nearest Neighbor (KNN), Random Forest(RF), Linear Regression (LR); it is found that the Naïve Bayes (NB) is Ranked as the best Classifier and Random Forest (RF) was the second best ranked alternative for analyzing the PIMA INDIAN DATASET, to predict the diabetes.","PeriodicalId":129946,"journal":{"name":"2020 9th International Conference System Modeling and Advancement in Research Trends (SMART)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EDAS Based Selection of Machine Learning Algorithm for Diabetes Detection\",\"authors\":\"S. Sharma, Bhavya Sharma\",\"doi\":\"10.1109/SMART50582.2020.9337118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetes is a matter of concern for the health of the entire world, its diagnosis and cure are among the prime challenge for the medical fraternity, because it can be controlled but can't be cured, sooner the diagnosis the better it will be for the patient. Thus, use of machine learning for timely classification, of diabetes plays a vital role to protect patient from the life threatening complications in future. Various classification techniques are available in Machine Learning (ML) viz. Support Vector Machines (SVM), Random Forest, Naïve Bayes Classifier, Linear Regression (LR), K-Nearest Neighbor(KNN) algorithm, etc. etc. But the question is which of the classification techniques, timely and accurately identifies this sensitive disorder. While predicting Diabetes using any machine learning algorithm, the accuracy, specificity and sensitivity, are some of the important parameters. The strengthening of these parameters requires the understanding of dataset under consideration, i.e. whether the data set is having some missing values or outliers, if missing values exists then to strengthen the prediction accuracy; one has to apply the data imputation techniques on the dataset. In the performed work the well-known dataset (Pima Indian) from UCI repository, was subject to data imputation techniques to handle the missing values present in it(tabulated in Table-1). Thereafter the said Machine Learning techniques were applied, and compared on the basis of various parameters viz. Accuracy, Sensitivity, and Specificity etc., to choose the best among algorithm one has to compare the multiple criteria's altogether, which is quite challenging. Thus, in the performed work the Evaluation Based on Distance from Average Solution (EDAS) is applied, it is a technique of Multi-Criteria-Decision-Making (MCDM). By applying EDAS over the performance evaluation statistics (speed, accuracy, specificity and sensitivity) of various classification algorithms viz. Naïve Bayes (NB) Classifier, Support Vector Machines (SVM),K-Nearest Neighbor (KNN), Random Forest(RF), Linear Regression (LR); it is found that the Naïve Bayes (NB) is Ranked as the best Classifier and Random Forest (RF) was the second best ranked alternative for analyzing the PIMA INDIAN DATASET, to predict the diabetes.\",\"PeriodicalId\":129946,\"journal\":{\"name\":\"2020 9th International Conference System Modeling and Advancement in Research Trends (SMART)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 9th International Conference System Modeling and Advancement in Research Trends (SMART)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SMART50582.2020.9337118\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 9th International Conference System Modeling and Advancement in Research Trends (SMART)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMART50582.2020.9337118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

糖尿病是一个关乎整个世界健康的问题，它的诊断和治疗是医学界面临的主要挑战之一，因为它可以控制但不能治愈，越早诊断对病人越好。因此，利用机器学习对糖尿病进行及时分类，对于保护患者免受未来危及生命的并发症的影响具有至关重要的作用。机器学习(ML)中可以使用各种分类技术，即支持向量机(SVM)，随机森林，Naïve贝叶斯分类器，线性回归(LR)， k -最近邻(KNN)算法等。但问题是哪一种分类技术能及时准确地识别出这种敏感的疾病。在使用任何机器学习算法预测糖尿病时，准确性、特异性和敏感性是一些重要的参数。这些参数的强化需要对所考虑的数据集的理解，即数据集是否存在缺失值或异常值，如果存在缺失值则加强预测精度;必须在数据集上应用数据输入技术。在执行的工作中，来自UCI存储库的知名数据集(皮马印第安人)受到数据插入技术的影响，以处理其中存在的缺失值(见表1)。此后，上述机器学习技术被应用，并在各种参数的基础上进行比较，即准确性，灵敏度和特异性等，要在算法中选择最佳算法，必须同时比较多个标准，这是相当具有挑战性的。因此，在实际工作中，基于平均解距离的评价(EDAS)是一种多准则决策(MCDM)技术。通过将EDAS应用于各种分类算法的性能评估统计(速度，准确性，特异性和灵敏度)，即Naïve贝叶斯(NB)分类器，支持向量机(SVM)， k -最近邻(KNN)，随机森林(RF)，线性回归(LR);发现Naïve贝叶斯(NB)被评为最佳分类器，随机森林(RF)是分析PIMA印度数据集预测糖尿病的第二好选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

EDAS Based Selection of Machine Learning Algorithm for Diabetes Detection

Diabetes is a matter of concern for the health of the entire world, its diagnosis and cure are among the prime challenge for the medical fraternity, because it can be controlled but can't be cured, sooner the diagnosis the better it will be for the patient. Thus, use of machine learning for timely classification, of diabetes plays a vital role to protect patient from the life threatening complications in future. Various classification techniques are available in Machine Learning (ML) viz. Support Vector Machines (SVM), Random Forest, Naïve Bayes Classifier, Linear Regression (LR), K-Nearest Neighbor(KNN) algorithm, etc. etc. But the question is which of the classification techniques, timely and accurately identifies this sensitive disorder. While predicting Diabetes using any machine learning algorithm, the accuracy, specificity and sensitivity, are some of the important parameters. The strengthening of these parameters requires the understanding of dataset under consideration, i.e. whether the data set is having some missing values or outliers, if missing values exists then to strengthen the prediction accuracy; one has to apply the data imputation techniques on the dataset. In the performed work the well-known dataset (Pima Indian) from UCI repository, was subject to data imputation techniques to handle the missing values present in it(tabulated in Table-1). Thereafter the said Machine Learning techniques were applied, and compared on the basis of various parameters viz. Accuracy, Sensitivity, and Specificity etc., to choose the best among algorithm one has to compare the multiple criteria's altogether, which is quite challenging. Thus, in the performed work the Evaluation Based on Distance from Average Solution (EDAS) is applied, it is a technique of Multi-Criteria-Decision-Making (MCDM). By applying EDAS over the performance evaluation statistics (speed, accuracy, specificity and sensitivity) of various classification algorithms viz. Naïve Bayes (NB) Classifier, Support Vector Machines (SVM),K-Nearest Neighbor (KNN), Random Forest(RF), Linear Regression (LR); it is found that the Naïve Bayes (NB) is Ranked as the best Classifier and Random Forest (RF) was the second best ranked alternative for analyzing the PIMA INDIAN DATASET, to predict the diabetes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 9th International Conference System Modeling and Advancement in Research Trends (SMART)

自引率

0.00%

发文量