The Comparison of C4.5 and CART (Classification and Regression Tree) Algorithm in Classification of Occupation for Fresh Graduate

Febian Joshua Reynara, Sepriana Carolina, Iustisia Natalia Simbolon
{"title":"The Comparison of C4.5 and CART (Classification and Regression Tree) Algorithm in Classification of Occupation for Fresh Graduate","authors":"Febian Joshua Reynara, Sepriana Carolina, Iustisia Natalia Simbolon","doi":"10.4108/eai.27-11-2021.2315527","DOIUrl":null,"url":null,"abstract":". The problem that college students face is the difficulty of determining the appropriate field of work after they graduate from college. In this study, a classification of the field of work was carried out using the data mining method based on the alumni field of work data. The data on the field of work of alumni contained information such as gender, study program, practical work topics, types of practical work companies, final project topics, and year of graduation. The classification on the field of work carried out was divided into three types of experiments, namely experiments in eight target categories (STQA Engineer, Software and Mobile Application Developer, Web Developer, UI/UX Designer, Software and Business Analyst, Lecturer and Researcher, AI Engineer, DevOps and Cybersecurity Practitioner), three target categories (SQA, Programmer, Data Manager, and Analyst) and two target categories (Programmer and Non-Programmer). The data mining algorithms used to classify were C4.5 and CART (Classification and Regression Tree). The accuracy obtained using the C4.5 algorithm was 42% in the eight categories experiment, 58% in the three categories experiment, and 75% in the two categories experiment. In comparison, the accuracy obtained using the CART algorithm was 43% in the eight categories experiment, 61% in the three categories experiment, and 77% in the two categories experiment. Based on the experimental results, it can be concluded that the best algorithm to classify the fields of work based on alumni data from the two algorithms used is the CART algorithm, even though the difference is not too significant.","PeriodicalId":246168,"journal":{"name":"Proceedings of the 4th International Conference on Vocational Education and Technology, IConVET 2021, 27 November 2021, Singaraja, Bali, Indonesia","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Vocational Education and Technology, IConVET 2021, 27 November 2021, Singaraja, Bali, Indonesia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/eai.27-11-2021.2315527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

. The problem that college students face is the difficulty of determining the appropriate field of work after they graduate from college. In this study, a classification of the field of work was carried out using the data mining method based on the alumni field of work data. The data on the field of work of alumni contained information such as gender, study program, practical work topics, types of practical work companies, final project topics, and year of graduation. The classification on the field of work carried out was divided into three types of experiments, namely experiments in eight target categories (STQA Engineer, Software and Mobile Application Developer, Web Developer, UI/UX Designer, Software and Business Analyst, Lecturer and Researcher, AI Engineer, DevOps and Cybersecurity Practitioner), three target categories (SQA, Programmer, Data Manager, and Analyst) and two target categories (Programmer and Non-Programmer). The data mining algorithms used to classify were C4.5 and CART (Classification and Regression Tree). The accuracy obtained using the C4.5 algorithm was 42% in the eight categories experiment, 58% in the three categories experiment, and 75% in the two categories experiment. In comparison, the accuracy obtained using the CART algorithm was 43% in the eight categories experiment, 61% in the three categories experiment, and 77% in the two categories experiment. Based on the experimental results, it can be concluded that the best algorithm to classify the fields of work based on alumni data from the two algorithms used is the CART algorithm, even though the difference is not too significant.
C4.5与CART算法在应届毕业生职业分类中的比较
. 大学生面临的问题是毕业后难以确定合适的工作领域。在本研究中,采用基于校友工作领域数据的数据挖掘方法对工作领域进行分类。校友的工作领域数据包括性别、学习项目、实际工作主题、实际工作公司类型、期末项目主题、毕业年份等信息。所开展工作领域的分类分为三类实验,即八个目标类别(STQA工程师、软件和移动应用开发人员、Web开发人员、UI/UX设计师、软件和业务分析师、讲师和研究员、人工智能工程师、DevOps和网络安全从业者)的实验,三个目标类别(SQA、程序员、数据管理人员和分析师)和两个目标类别(程序员和非程序员)的实验。分类使用的数据挖掘算法为C4.5和CART (Classification and Regression Tree)。C4.5算法在8类实验中准确率为42%,在3类实验中准确率为58%,在2类实验中准确率为75%。相比之下,CART算法在8类实验中准确率为43%,在3类实验中准确率为61%,在2类实验中准确率为77%。从实验结果可以看出,尽管两种算法的差异并不太显著,但基于校友数据进行工作领域分类的最佳算法是CART算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信