C5.0 算法与 CART 算法在中风分类中的比较

Indah Lestari, Dina Fitria, dan Admi Syafriandi, Salma
{"title":"C5.0 算法与 CART 算法在中风分类中的比较","authors":"Indah Lestari, Dina Fitria, dan Admi Syafriandi, Salma","doi":"10.24036/ujsds/vol2-iss1/144","DOIUrl":null,"url":null,"abstract":"The C5.0 and CART algorithms are similar in terms of velocity and handling of categorical and numeric type data. However, these two algorithms are differences in terms the CART algorithm is binary and classifies categorical, numerical and continuous response variables resulting in classification and regression decision trees. Meanwhile, the C5.0 algorithm is non-binary and classifies categorical response variables resulting in a classification tree. This research aims to classify the Kaggle’s Stroke Prediction Dataset to find out the variables that most influence the risk of stroke, as well as to compare the results of the classification accuracy of the both algorithms. The results of the study showed that CART algorithm has a higher value of accuracy and precision, but its recall value is lower than C5.0. The accuracy value of each algorithm is 77.9% and 77.5%, presision is 89.5% and 83.2%, recall is 67% and 71.4%. Overrall, it can be concluded that there is no difference in classification between the two algorithm. Beside that, in the CART there were 3 variables that most influence on stroke risk, they are age, BMI, and average blood glucose levels. Meanwhile, in C5.0 only 2 variable that most influence, there are age and average blood glucose levels.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"12 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of the C5.0 Algorithm and the CART Algorithm in Stroke Classification\",\"authors\":\"Indah Lestari, Dina Fitria, dan Admi Syafriandi, Salma\",\"doi\":\"10.24036/ujsds/vol2-iss1/144\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The C5.0 and CART algorithms are similar in terms of velocity and handling of categorical and numeric type data. However, these two algorithms are differences in terms the CART algorithm is binary and classifies categorical, numerical and continuous response variables resulting in classification and regression decision trees. Meanwhile, the C5.0 algorithm is non-binary and classifies categorical response variables resulting in a classification tree. This research aims to classify the Kaggle’s Stroke Prediction Dataset to find out the variables that most influence the risk of stroke, as well as to compare the results of the classification accuracy of the both algorithms. The results of the study showed that CART algorithm has a higher value of accuracy and precision, but its recall value is lower than C5.0. The accuracy value of each algorithm is 77.9% and 77.5%, presision is 89.5% and 83.2%, recall is 67% and 71.4%. Overrall, it can be concluded that there is no difference in classification between the two algorithm. Beside that, in the CART there were 3 variables that most influence on stroke risk, they are age, BMI, and average blood glucose levels. Meanwhile, in C5.0 only 2 variable that most influence, there are age and average blood glucose levels.\",\"PeriodicalId\":220933,\"journal\":{\"name\":\"UNP Journal of Statistics and Data Science\",\"volume\":\"12 8\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"UNP Journal of Statistics and Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24036/ujsds/vol2-iss1/144\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"UNP Journal of Statistics and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24036/ujsds/vol2-iss1/144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

C5.0 算法和 CART 算法在速度和处理分类及数字类型数据方面相似。不过,这两种算法的不同之处在于,CART 算法是二进制算法,可对分类、数值和连续响应变量进行分类,从而形成分类和回归决策树。与此同时,C5.0 算法是非二进制的,对分类响应变量进行分类,形成分类树。本研究旨在对 Kaggle 的中风预测数据集进行分类,找出对中风风险影响最大的变量,并比较两种算法的分类准确性。研究结果表明,CART 算法的准确度和精确度值较高,但召回值低于 C5.0。两种算法的准确率分别为 77.9% 和 77.5%,精确率分别为 89.5% 和 83.2%,召回率分别为 67% 和 71.4%。总的来说,两种算法的分类效果没有差别。此外,在 CART 中,有 3 个变量对中风风险影响最大,它们是年龄、体重指数和平均血糖水平。而在 C5.0 中,影响最大的变量只有 2 个,即年龄和平均血糖水平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of the C5.0 Algorithm and the CART Algorithm in Stroke Classification
The C5.0 and CART algorithms are similar in terms of velocity and handling of categorical and numeric type data. However, these two algorithms are differences in terms the CART algorithm is binary and classifies categorical, numerical and continuous response variables resulting in classification and regression decision trees. Meanwhile, the C5.0 algorithm is non-binary and classifies categorical response variables resulting in a classification tree. This research aims to classify the Kaggle’s Stroke Prediction Dataset to find out the variables that most influence the risk of stroke, as well as to compare the results of the classification accuracy of the both algorithms. The results of the study showed that CART algorithm has a higher value of accuracy and precision, but its recall value is lower than C5.0. The accuracy value of each algorithm is 77.9% and 77.5%, presision is 89.5% and 83.2%, recall is 67% and 71.4%. Overrall, it can be concluded that there is no difference in classification between the two algorithm. Beside that, in the CART there were 3 variables that most influence on stroke risk, they are age, BMI, and average blood glucose levels. Meanwhile, in C5.0 only 2 variable that most influence, there are age and average blood glucose levels.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信