考虑互信息和信息特征的糖尿病分类比较算法

Rahmat Ramadhani, T. H. Saragih, Muhammad Itqan Mazdadi, M. Muliadi
{"title":"考虑互信息和信息特征的糖尿病分类比较算法","authors":"Rahmat Ramadhani, T. H. Saragih, Muhammad Itqan Mazdadi, M. Muliadi","doi":"10.23960/komputasi.v11i1.6649","DOIUrl":null,"url":null,"abstract":"Diabetes is a prevalent disease in humans that is caused by excessive sugar levels in the body. If left untreated, it can lead to severe consequences such as paralysis, decay in certain parts of the body, and even death. Unfortunately, early detection of diabetes is difficult, and many cases go untreated until it is too late. However, the development of technology has opened up new possibilities for early detection and treatment of diabetes. One such approach is classification, a commonly used method in the field of Computer Science. Classification is used in various fields, including health, agriculture, and animal diseases, to draw conclusions based on input data using cause-and-effect relationships. Many different learning concepts and methods can be used in classification, with the Decision Tree concept being one of the most popular examples. This study compares several classification methods, including Decision Tree, Random Forest, AdaBoost, and Stochastic Gradient Boost, with feature selections carried out using MI and IF. The study aims to evaluate the effectiveness of these methods and the influence of feature selection on improving their performance. Based on the results of the study, it can be concluded that feature selection using Mutual Information and Importance Feature can improve the classification accuracy in some methods, particularly in Random Forest, AdaBoost, and Stochastic Gradient Boost. However, the Decision Tree algorithm did not show any improvement in accuracy after feature selection. The best classification accuracy was achieved with the Stochastic Gradient Boost method using the original dataset without feature selection, while the Random Forest method showed the highest accuracy after using all the features. Overall, the results suggest that feature selection can be a useful technique for improving the performance of classification algorithms in diabetes prediction. The study suggests that future research could investigate other classification methods, such as Neural Network or Deep Learning, and use optimization algorithms like Genetic Algorithm or Particle Swarm Optimization to improve feature selection results.","PeriodicalId":292117,"journal":{"name":"Jurnal Komputasi","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison Algorithm for Diabetes Classification with Consideration of Mutual Information and Information Feature\",\"authors\":\"Rahmat Ramadhani, T. H. Saragih, Muhammad Itqan Mazdadi, M. Muliadi\",\"doi\":\"10.23960/komputasi.v11i1.6649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetes is a prevalent disease in humans that is caused by excessive sugar levels in the body. If left untreated, it can lead to severe consequences such as paralysis, decay in certain parts of the body, and even death. Unfortunately, early detection of diabetes is difficult, and many cases go untreated until it is too late. However, the development of technology has opened up new possibilities for early detection and treatment of diabetes. One such approach is classification, a commonly used method in the field of Computer Science. Classification is used in various fields, including health, agriculture, and animal diseases, to draw conclusions based on input data using cause-and-effect relationships. Many different learning concepts and methods can be used in classification, with the Decision Tree concept being one of the most popular examples. This study compares several classification methods, including Decision Tree, Random Forest, AdaBoost, and Stochastic Gradient Boost, with feature selections carried out using MI and IF. The study aims to evaluate the effectiveness of these methods and the influence of feature selection on improving their performance. Based on the results of the study, it can be concluded that feature selection using Mutual Information and Importance Feature can improve the classification accuracy in some methods, particularly in Random Forest, AdaBoost, and Stochastic Gradient Boost. However, the Decision Tree algorithm did not show any improvement in accuracy after feature selection. The best classification accuracy was achieved with the Stochastic Gradient Boost method using the original dataset without feature selection, while the Random Forest method showed the highest accuracy after using all the features. Overall, the results suggest that feature selection can be a useful technique for improving the performance of classification algorithms in diabetes prediction. The study suggests that future research could investigate other classification methods, such as Neural Network or Deep Learning, and use optimization algorithms like Genetic Algorithm or Particle Swarm Optimization to improve feature selection results.\",\"PeriodicalId\":292117,\"journal\":{\"name\":\"Jurnal Komputasi\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Komputasi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23960/komputasi.v11i1.6649\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Komputasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23960/komputasi.v11i1.6649","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

糖尿病是一种人类普遍存在的疾病,是由体内糖含量过高引起的。如果不及时治疗,它会导致严重的后果,如瘫痪,身体某些部位的腐烂,甚至死亡。不幸的是,早期发现糖尿病是困难的,许多病例得不到治疗,直到为时已晚。然而,技术的发展为糖尿病的早期发现和治疗开辟了新的可能性。其中一种方法是分类,这是计算机科学领域常用的方法。分类用于各个领域,包括卫生、农业和动物疾病,利用因果关系根据输入数据得出结论。分类中可以使用许多不同的学习概念和方法,决策树概念是最流行的例子之一。本研究比较了几种分类方法,包括决策树、随机森林、AdaBoost和随机梯度Boost,以及使用MI和IF进行的特征选择。本研究旨在评估这些方法的有效性以及特征选择对提高其性能的影响。研究结果表明,利用互信息(Mutual Information)和重要性特征(Importance feature)进行特征选择可以提高某些方法的分类精度,特别是在Random Forest、AdaBoost和Stochastic Gradient Boost中。然而,经过特征选择后,决策树算法在准确率上没有任何提高。未选择特征的随机梯度增强方法在原始数据集上的分类准确率最高,而随机森林方法在使用所有特征后的分类准确率最高。总的来说,结果表明特征选择可以成为提高糖尿病预测分类算法性能的有用技术。该研究表明,未来的研究可以探索其他分类方法,如神经网络或深度学习,并使用优化算法,如遗传算法或粒子群优化来改进特征选择结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison Algorithm for Diabetes Classification with Consideration of Mutual Information and Information Feature
Diabetes is a prevalent disease in humans that is caused by excessive sugar levels in the body. If left untreated, it can lead to severe consequences such as paralysis, decay in certain parts of the body, and even death. Unfortunately, early detection of diabetes is difficult, and many cases go untreated until it is too late. However, the development of technology has opened up new possibilities for early detection and treatment of diabetes. One such approach is classification, a commonly used method in the field of Computer Science. Classification is used in various fields, including health, agriculture, and animal diseases, to draw conclusions based on input data using cause-and-effect relationships. Many different learning concepts and methods can be used in classification, with the Decision Tree concept being one of the most popular examples. This study compares several classification methods, including Decision Tree, Random Forest, AdaBoost, and Stochastic Gradient Boost, with feature selections carried out using MI and IF. The study aims to evaluate the effectiveness of these methods and the influence of feature selection on improving their performance. Based on the results of the study, it can be concluded that feature selection using Mutual Information and Importance Feature can improve the classification accuracy in some methods, particularly in Random Forest, AdaBoost, and Stochastic Gradient Boost. However, the Decision Tree algorithm did not show any improvement in accuracy after feature selection. The best classification accuracy was achieved with the Stochastic Gradient Boost method using the original dataset without feature selection, while the Random Forest method showed the highest accuracy after using all the features. Overall, the results suggest that feature selection can be a useful technique for improving the performance of classification algorithms in diabetes prediction. The study suggests that future research could investigate other classification methods, such as Neural Network or Deep Learning, and use optimization algorithms like Genetic Algorithm or Particle Swarm Optimization to improve feature selection results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信