过采样与离群值处理支持向量机在糖尿病疾病检测分类中的性能比较

Firda Yunita Sari, Maharani Sukma Kuntari, Hani Khaulasari, Winda Ari Yati
{"title":"过采样与离群值处理支持向量机在糖尿病疾病检测分类中的性能比较","authors":"Firda Yunita Sari, Maharani Sukma Kuntari, Hani Khaulasari, Winda Ari Yati","doi":"10.30812/matrik.v22i3.2979","DOIUrl":null,"url":null,"abstract":"Diabetes mellitus is a disease that attacks chronic metabolism, characterized by the body’s inability to process carbohydrates, fats so that glucose levels are high. Diabetes mellitus is the sixth cause of death in the world. Classifying data about diabetes mellitus makes it easier to predict the disease. As technology develops, diabetes mellitus can be detected using machine learning methods. The method that can be done is the support vector machine. The advantage of SVM is that it is very effective in completing classification, so it can quickly separate each positive and negative point. This study aimed to obtain the best SVM classification model based on accuracy, sensitivity, and precision values in detecting diabetes by adding Synthetic Minority Over-Sampling Technique (SMOTE) and handling outliers. The SMOTE method was applied to handle class imbalance. The Support Vector Machine (SVM) method aimed to produce a function as a dividing line or what can be called a hyperplane that matches all input data with the smallest possible error. The data studied were indications of diabetes, consisting of 8-factor variables and 1 class variable. The test results show that the SVM-SMOTE scenario produces the best accuracy. The SVM SMOTE scenario produced an accuracy value of the RBF kernel of 88% with an error of 12%, and this is obtained from the division of test data and training data of 90:10. This SVM-SMOTE scenario produced a precision value of 0.880 and a sensitivity value of 0.880. The research results showed that factor classification was more accurate if it is carried out using the support vector machine (SVM) method with imbalance data handling (SMOTE), and it can be concluded that the distribution of test data and training data influences a test scenario.","PeriodicalId":364657,"journal":{"name":"MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer","volume":"161 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Support Vector Machine Performance with Oversampling and Outlier Handling in Diabetic Disease Detection Classification\",\"authors\":\"Firda Yunita Sari, Maharani Sukma Kuntari, Hani Khaulasari, Winda Ari Yati\",\"doi\":\"10.30812/matrik.v22i3.2979\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetes mellitus is a disease that attacks chronic metabolism, characterized by the body’s inability to process carbohydrates, fats so that glucose levels are high. Diabetes mellitus is the sixth cause of death in the world. Classifying data about diabetes mellitus makes it easier to predict the disease. As technology develops, diabetes mellitus can be detected using machine learning methods. The method that can be done is the support vector machine. The advantage of SVM is that it is very effective in completing classification, so it can quickly separate each positive and negative point. This study aimed to obtain the best SVM classification model based on accuracy, sensitivity, and precision values in detecting diabetes by adding Synthetic Minority Over-Sampling Technique (SMOTE) and handling outliers. The SMOTE method was applied to handle class imbalance. The Support Vector Machine (SVM) method aimed to produce a function as a dividing line or what can be called a hyperplane that matches all input data with the smallest possible error. The data studied were indications of diabetes, consisting of 8-factor variables and 1 class variable. The test results show that the SVM-SMOTE scenario produces the best accuracy. The SVM SMOTE scenario produced an accuracy value of the RBF kernel of 88% with an error of 12%, and this is obtained from the division of test data and training data of 90:10. This SVM-SMOTE scenario produced a precision value of 0.880 and a sensitivity value of 0.880. The research results showed that factor classification was more accurate if it is carried out using the support vector machine (SVM) method with imbalance data handling (SMOTE), and it can be concluded that the distribution of test data and training data influences a test scenario.\",\"PeriodicalId\":364657,\"journal\":{\"name\":\"MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer\",\"volume\":\"161 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30812/matrik.v22i3.2979\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30812/matrik.v22i3.2979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

糖尿病是一种慢性代谢疾病,其特征是身体无法处理碳水化合物和脂肪,因此葡萄糖水平很高。糖尿病是世界上第六大死因。对糖尿病的数据进行分类可以更容易地预测这种疾病。随着技术的发展,可以使用机器学习方法检测糖尿病。可以做到的方法是支持向量机。支持向量机的优点是完成分类非常有效,因此它可以快速地分离每个正、负点。本研究旨在通过添加合成少数派过采样技术(Synthetic Minority oversampling Technique, SMOTE)和处理异常值,获得基于准确率、灵敏度和精密度值的最佳支持向量机分类模型来检测糖尿病。采用SMOTE方法处理类不平衡。支持向量机(SVM)方法旨在生成一个函数作为分界线,或者可以称为超平面,它以最小的误差匹配所有输入数据。研究的数据为糖尿病的指征,由8个因素变量和1类变量组成。测试结果表明,SVM-SMOTE场景产生了最好的精度。SVM SMOTE场景的RBF核准确率值为88%,误差为12%,这是由测试数据和训练数据的90:10分割得到的。该SVM-SMOTE情景的精度值为0.880,灵敏度值为0.880。研究结果表明,采用支持向量机(SVM)方法结合不平衡数据处理(SMOTE)进行因子分类时,分类准确率更高,可以得出测试数据和训练数据的分布影响测试场景的结论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of Support Vector Machine Performance with Oversampling and Outlier Handling in Diabetic Disease Detection Classification
Diabetes mellitus is a disease that attacks chronic metabolism, characterized by the body’s inability to process carbohydrates, fats so that glucose levels are high. Diabetes mellitus is the sixth cause of death in the world. Classifying data about diabetes mellitus makes it easier to predict the disease. As technology develops, diabetes mellitus can be detected using machine learning methods. The method that can be done is the support vector machine. The advantage of SVM is that it is very effective in completing classification, so it can quickly separate each positive and negative point. This study aimed to obtain the best SVM classification model based on accuracy, sensitivity, and precision values in detecting diabetes by adding Synthetic Minority Over-Sampling Technique (SMOTE) and handling outliers. The SMOTE method was applied to handle class imbalance. The Support Vector Machine (SVM) method aimed to produce a function as a dividing line or what can be called a hyperplane that matches all input data with the smallest possible error. The data studied were indications of diabetes, consisting of 8-factor variables and 1 class variable. The test results show that the SVM-SMOTE scenario produces the best accuracy. The SVM SMOTE scenario produced an accuracy value of the RBF kernel of 88% with an error of 12%, and this is obtained from the division of test data and training data of 90:10. This SVM-SMOTE scenario produced a precision value of 0.880 and a sensitivity value of 0.880. The research results showed that factor classification was more accurate if it is carried out using the support vector machine (SVM) method with imbalance data handling (SMOTE), and it can be concluded that the distribution of test data and training data influences a test scenario.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信