Vikas Singh, Rahul K. Gupta, R. K. Sevakula, N. Verma
{"title":"基于图约简的高斯混合模型、逻辑回归和随机森林大数据分类的比较分析","authors":"Vikas Singh, Rahul K. Gupta, R. K. Sevakula, N. Verma","doi":"10.1109/ICIINFS.2016.8262961","DOIUrl":null,"url":null,"abstract":"In the era of modern world, big data becomes major transformation of new technology, The amount of data generated by mankind is growing every year. To classify such big data is a challenging task with standard data mining techniques. This paper presents a Map Reduce based algorithm with Gaussian mixture model(GMM), Logistic regression(LR) and Random forest classifier(RFC). While, map phase determines the probabilities and class labels of the test data, the reduce phase predicts the class labels of test data by aggregating results from all the mappers. We have analyzed these algorithms on the basis of test accuracy, run time and number of mappers on multiple big data sets.","PeriodicalId":234609,"journal":{"name":"2016 11th International Conference on Industrial and Information Systems (ICIIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Comparative analysis of Gaussian mixture model, logistic regression and random forest for big data classification using map reduce\",\"authors\":\"Vikas Singh, Rahul K. Gupta, R. K. Sevakula, N. Verma\",\"doi\":\"10.1109/ICIINFS.2016.8262961\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of modern world, big data becomes major transformation of new technology, The amount of data generated by mankind is growing every year. To classify such big data is a challenging task with standard data mining techniques. This paper presents a Map Reduce based algorithm with Gaussian mixture model(GMM), Logistic regression(LR) and Random forest classifier(RFC). While, map phase determines the probabilities and class labels of the test data, the reduce phase predicts the class labels of test data by aggregating results from all the mappers. We have analyzed these algorithms on the basis of test accuracy, run time and number of mappers on multiple big data sets.\",\"PeriodicalId\":234609,\"journal\":{\"name\":\"2016 11th International Conference on Industrial and Information Systems (ICIIS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 11th International Conference on Industrial and Information Systems (ICIIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIINFS.2016.8262961\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 11th International Conference on Industrial and Information Systems (ICIIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIINFS.2016.8262961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative analysis of Gaussian mixture model, logistic regression and random forest for big data classification using map reduce
In the era of modern world, big data becomes major transformation of new technology, The amount of data generated by mankind is growing every year. To classify such big data is a challenging task with standard data mining techniques. This paper presents a Map Reduce based algorithm with Gaussian mixture model(GMM), Logistic regression(LR) and Random forest classifier(RFC). While, map phase determines the probabilities and class labels of the test data, the reduce phase predicts the class labels of test data by aggregating results from all the mappers. We have analyzed these algorithms on the basis of test accuracy, run time and number of mappers on multiple big data sets.