Bench marking of classification algorithms: Decision Trees and Random Forests - a case study using R

Manish Varma Datla
{"title":"Bench marking of classification algorithms: Decision Trees and Random Forests - a case study using R","authors":"Manish Varma Datla","doi":"10.1109/ITACT.2015.7492647","DOIUrl":null,"url":null,"abstract":"Decision Trees and Random Forests are leading Machine Learning Algorithms, which are used for Classification purposes. Through the course of this paper, a comparison is made of classification results of these two algorithms, for classifying data sets obtained from Kaggle's Bike Sharing System and Titanic problems. The solution methodology deployed is primarily broken into two segments. First, being Feature Engineering where the given instance variables are made noise free and two or more variables are used together to give rise to a valuable third. Secondly, the classification parameters are worked out, consisting of correctly classified instances, incorrectly classified instances, Precision and Accuracy. This process ensured that the instance variables and classification parameters were best treated before they were deployed with the two algorithms i.e. Decision Trees and Random Forests. The developed model has been validated by using Systems data and the Classification results. From the model it can safely be concluded that for all classification problems Decision Trees is handy with small data sets i.e. less number of instances and Random Forests gives better results for the same number of attributes and large data sets i.e. with greater number of instances. R language has been used to solve the problem and to present the results.","PeriodicalId":336783,"journal":{"name":"2015 International Conference on Trends in Automation, Communications and Computing Technology (I-TACT-15)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Trends in Automation, Communications and Computing Technology (I-TACT-15)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITACT.2015.7492647","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Decision Trees and Random Forests are leading Machine Learning Algorithms, which are used for Classification purposes. Through the course of this paper, a comparison is made of classification results of these two algorithms, for classifying data sets obtained from Kaggle's Bike Sharing System and Titanic problems. The solution methodology deployed is primarily broken into two segments. First, being Feature Engineering where the given instance variables are made noise free and two or more variables are used together to give rise to a valuable third. Secondly, the classification parameters are worked out, consisting of correctly classified instances, incorrectly classified instances, Precision and Accuracy. This process ensured that the instance variables and classification parameters were best treated before they were deployed with the two algorithms i.e. Decision Trees and Random Forests. The developed model has been validated by using Systems data and the Classification results. From the model it can safely be concluded that for all classification problems Decision Trees is handy with small data sets i.e. less number of instances and Random Forests gives better results for the same number of attributes and large data sets i.e. with greater number of instances. R language has been used to solve the problem and to present the results.
分类算法的基准测试:决策树和随机森林-使用R的案例研究
决策树和随机森林是主要的机器学习算法,用于分类目的。通过本文的过程,比较了这两种算法的分类结果,分别对Kaggle’s Bike Sharing System和Titanic问题的数据集进行分类。部署的解决方案方法主要分为两个部分。首先,在特征工程中,给定的实例变量是无噪声的,两个或更多的变量一起使用来产生有价值的第三个变量。其次,确定了分类参数,包括正确分类实例、错误分类实例、Precision和Accuracy;这个过程确保了实例变量和分类参数在使用决策树和随机森林两种算法部署之前得到了最好的处理。利用系统数据和分类结果对所建立的模型进行了验证。从模型中可以安全地得出结论,对于所有分类问题,决策树对于小数据集(即较少的实例数)很方便,而随机森林对于相同数量的属性和大数据集(即具有更多的实例数)给出了更好的结果。使用R语言来解决问题并给出结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信