Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus

Scientific Journal of Informatics Pub Date : 2021-11-30 DOI:10.15294/sji.v8i2.32484

Nur Ghaniaviyanto Ramadhan

{"title":"Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus","authors":"Nur Ghaniaviyanto Ramadhan","doi":"10.15294/sji.v8i2.32484","DOIUrl":null,"url":null,"abstract":"Most people with diabetes in the world are type 2. We can detect diabetes early to prevent things that are not desirable by checking sugar and insulin levels with the doctor. In addition to using this method, people with diabetes can also be grouped based on data from diabetes examination results. However, most of the data on health examination results have several parameters that are difficult for the public to understand. These problems can be done by means of automatic classification. In addition to these problems, there is another problem in the form of an unbalanced amount of data for diabetics and non-diabetics. This problem can be done by balancing the amount of data using the model to increase the ratio of the amount of data that is small or decrease the ratio of the amount of data that is too much. Purpose: This study aims to detect type 2 diabetes mellitus using the SVM classification model and analyze the results of the comparison using the SMOTE and ADASYN data balancing technique which is the best. Methods/Study design/approach: The research method starts from collecting the diabetes dataset, then the dataset cleaning process is carried out whether there is a null value or not. After applying two oversampling methods to analyze which method is the most appropriate. After the oversampling technique was carried out, data classification was carried out using a support vector machine model to see the accuracy results. Result/Findings: The results obtained by the ADASYN-SVM method are superior to SMOTE-SVM. The ADASYNSVM method has an accuracy of 87.3%, while the SMOTE-SVM has an accuracy of 85.4%. Novelty/Originality/Value: The data used in this study came from the Karya Medika clinic, Indonesia which contains parameters related to type 2 diabetes.","PeriodicalId":30781,"journal":{"name":"Scientific Journal of Informatics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Journal of Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15294/sji.v8i2.32484","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Most people with diabetes in the world are type 2. We can detect diabetes early to prevent things that are not desirable by checking sugar and insulin levels with the doctor. In addition to using this method, people with diabetes can also be grouped based on data from diabetes examination results. However, most of the data on health examination results have several parameters that are difficult for the public to understand. These problems can be done by means of automatic classification. In addition to these problems, there is another problem in the form of an unbalanced amount of data for diabetics and non-diabetics. This problem can be done by balancing the amount of data using the model to increase the ratio of the amount of data that is small or decrease the ratio of the amount of data that is too much. Purpose: This study aims to detect type 2 diabetes mellitus using the SVM classification model and analyze the results of the comparison using the SMOTE and ADASYN data balancing technique which is the best. Methods/Study design/approach: The research method starts from collecting the diabetes dataset, then the dataset cleaning process is carried out whether there is a null value or not. After applying two oversampling methods to analyze which method is the most appropriate. After the oversampling technique was carried out, data classification was carried out using a support vector machine model to see the accuracy results. Result/Findings: The results obtained by the ADASYN-SVM method are superior to SMOTE-SVM. The ADASYNSVM method has an accuracy of 87.3%, while the SMOTE-SVM has an accuracy of 85.4%. Novelty/Originality/Value: The data used in this study came from the Karya Medika clinic, Indonesia which contains parameters related to type 2 diabetes.

查看原文本刊更多论文

adasynn - svm与SMOTE-SVM检测2型糖尿病的比较分析

世界上大多数糖尿病患者都是2型糖尿病。我们可以通过与医生检查血糖和胰岛素水平，及早发现糖尿病，以预防不理想的情况。除了使用这种方法，糖尿病患者还可以根据糖尿病检查结果的数据进行分组。然而，大多数健康检查结果的数据都有几个公众难以理解的参数。这些问题可以通过自动分类来解决。除了这些问题之外，还有另一个问题，即糖尿病患者和非糖尿病患者的数据量不平衡。这个问题可以通过使用模型平衡数据量来实现，以增加小数据量的比率或减少过多数据量的比例。目的：本研究旨在使用SVM分类模型检测2型糖尿病，并使用最佳的SMOTE和ADASYN数据平衡技术分析比较结果。方法/研究设计/方法：研究方法从收集糖尿病数据集开始，然后进行数据集清理过程，无论是否存在零值。在应用两种过采样方法来分析哪种方法最合适之后。在进行过采样技术之后，使用支持向量机模型进行数据分类，以查看准确性结果。结果/发现：ADASYN-SVM法的结果优于SMOTE-SVM法。ADASYNSVM方法的准确率为87.3%，而SMOTE-SVM方法的准确度为85.4%。新颖性/独创性/价值：本研究中使用的数据来自印度尼西亚Karya Medika诊所，该诊所包含与2型糖尿病相关的参数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific Journal of Informatics

自引率

0.00%

发文量

审稿时长

24 weeks