Data Mining with R: An Applied Study

Burcu Durmuş, Ö. I. Güneri
{"title":"Data Mining with R: An Applied Study","authors":"Burcu Durmuş, Ö. I. Güneri","doi":"10.25147/ijcsr.2017.001.1.34","DOIUrl":null,"url":null,"abstract":"Purpose – The aim of this study is to analyze different classification algorithms with R programming and to determine the accuracy rates. It also encourages the use of the R program by giving readers the opportunity to experiment.Method – For the purposes mentioned above, different data sets were obtained from the UC Irvine Machine Learning Repository (2019), which was suitable for classification. After preparing data set and R program for data mining, performance evaluation was made with classification algorithms (J48, Random Forest, Naive Bayes). The 'accuracy' criterion was taken into consideration when interpreting the results.Results – At the end of the study, the accuracy rates were determined for three data sets. Looking at the \"wine\" data, the performance of all three algorithms is quite successful. The results of the other two data sets (lenses and liver) are parallel. Only the ‘liver’ dataset gave a slightly lower accuracy than expected with the Naive Bayes algorithm (0.55).Conclusion – In this study, performance comparison of algorithms has been made within the scope of data mining with R program. The accuracy rate was taken as a criterion. All codes are given with their outputs in order to be an example especially for young researchers or students. It is thought that this study can be a source for other researchers, will encourage the use of R and the researchers or students will try new papers by trying the codes.Recommendations – In subsequent studies, a similar study can be done by developing the given codes. Or how to make classification analysis in R with different algorithms can be examined.","PeriodicalId":33870,"journal":{"name":"International Journal of Computing Sciences Research","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computing Sciences Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25147/ijcsr.2017.001.1.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Purpose – The aim of this study is to analyze different classification algorithms with R programming and to determine the accuracy rates. It also encourages the use of the R program by giving readers the opportunity to experiment.Method – For the purposes mentioned above, different data sets were obtained from the UC Irvine Machine Learning Repository (2019), which was suitable for classification. After preparing data set and R program for data mining, performance evaluation was made with classification algorithms (J48, Random Forest, Naive Bayes). The 'accuracy' criterion was taken into consideration when interpreting the results.Results – At the end of the study, the accuracy rates were determined for three data sets. Looking at the "wine" data, the performance of all three algorithms is quite successful. The results of the other two data sets (lenses and liver) are parallel. Only the ‘liver’ dataset gave a slightly lower accuracy than expected with the Naive Bayes algorithm (0.55).Conclusion – In this study, performance comparison of algorithms has been made within the scope of data mining with R program. The accuracy rate was taken as a criterion. All codes are given with their outputs in order to be an example especially for young researchers or students. It is thought that this study can be a source for other researchers, will encourage the use of R and the researchers or students will try new papers by trying the codes.Recommendations – In subsequent studies, a similar study can be done by developing the given codes. Or how to make classification analysis in R with different algorithms can be examined.
数据挖掘与R:一个应用研究
目的-本研究的目的是用R编程分析不同的分类算法,并确定准确率。它还通过给读者实验的机会来鼓励使用R程序。方法-出于上述目的,从UC Irvine Machine Learning Repository(2019)中获得不同的数据集,适合分类。在为数据挖掘准备好数据集和R程序后,使用分类算法(J48、随机森林、朴素贝叶斯)进行性能评价。在解释结果时考虑了“准确性”标准。结果-在研究结束时,确定了三个数据集的准确率。看看“wine”数据,这三种算法的性能都相当成功。另外两个数据集(透镜和肝脏)的结果是平行的。只有“肝脏”数据集的准确率略低于朴素贝叶斯算法的预期(0.55)。结论-在本研究中,使用R程序对数据挖掘范围内的算法进行了性能比较。以准确率为衡量标准。所有代码都与它们的输出一起给出,以便为年轻的研究人员或学生提供一个例子。据认为,这项研究可以为其他研究人员提供一个来源,将鼓励使用R,研究人员或学生将通过尝试代码来尝试新的论文。建议-在随后的研究中,可以通过制定给定的代码来进行类似的研究。或者如何用不同的算法在R中进行分类分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
25
审稿时长
20 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信