{"title":"Processing big data with decision trees: A case study in large traffic data","authors":"H. Wisesa, M. A. Ma'sum, P. Mursanto, A. Febrian","doi":"10.1109/IWBIS.2016.7872899","DOIUrl":null,"url":null,"abstract":"This paper provides a comparison of processing large traffic data by using decision trees. The experiment was tested in three different classifier tools that are very popular and are widely used in the community. These classifier tools are WEKA classifier, MoA (Massive Online Analysis) classifier, and SPARK MLib that runs on Hadoop infrastructure. We tested the traffic data using decision trees because it is one of the best methods for regressing the large data. The experiment results showed that the WEKA classifier fails to classify dataset with a large number of instance, wheras the MoA has successfully regress the dataset as a datastream. The SPARK MLib decision trees algorithm could also successfully resgress the traffic data quickly with a fairly good accuracy.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Workshop on Big Data and Information Security (IWBIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWBIS.2016.7872899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This paper provides a comparison of processing large traffic data by using decision trees. The experiment was tested in three different classifier tools that are very popular and are widely used in the community. These classifier tools are WEKA classifier, MoA (Massive Online Analysis) classifier, and SPARK MLib that runs on Hadoop infrastructure. We tested the traffic data using decision trees because it is one of the best methods for regressing the large data. The experiment results showed that the WEKA classifier fails to classify dataset with a large number of instance, wheras the MoA has successfully regress the dataset as a datastream. The SPARK MLib decision trees algorithm could also successfully resgress the traffic data quickly with a fairly good accuracy.