Vikas Singh, Rahul K. Gupta, R. K. Sevakula, N. Verma
{"title":"Comparative analysis of Gaussian mixture model, logistic regression and random forest for big data classification using map reduce","authors":"Vikas Singh, Rahul K. Gupta, R. K. Sevakula, N. Verma","doi":"10.1109/ICIINFS.2016.8262961","DOIUrl":null,"url":null,"abstract":"In the era of modern world, big data becomes major transformation of new technology, The amount of data generated by mankind is growing every year. To classify such big data is a challenging task with standard data mining techniques. This paper presents a Map Reduce based algorithm with Gaussian mixture model(GMM), Logistic regression(LR) and Random forest classifier(RFC). While, map phase determines the probabilities and class labels of the test data, the reduce phase predicts the class labels of test data by aggregating results from all the mappers. We have analyzed these algorithms on the basis of test accuracy, run time and number of mappers on multiple big data sets.","PeriodicalId":234609,"journal":{"name":"2016 11th International Conference on Industrial and Information Systems (ICIIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 11th International Conference on Industrial and Information Systems (ICIIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIINFS.2016.8262961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
In the era of modern world, big data becomes major transformation of new technology, The amount of data generated by mankind is growing every year. To classify such big data is a challenging task with standard data mining techniques. This paper presents a Map Reduce based algorithm with Gaussian mixture model(GMM), Logistic regression(LR) and Random forest classifier(RFC). While, map phase determines the probabilities and class labels of the test data, the reduce phase predicts the class labels of test data by aggregating results from all the mappers. We have analyzed these algorithms on the basis of test accuracy, run time and number of mappers on multiple big data sets.