{"title":"Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case","authors":"Khadija Aziz, Dounia Zaidouni, M. Bellafkih","doi":"10.1145/3289402.3289525","DOIUrl":null,"url":null,"abstract":"Machine learning is a field within artificial intelligence that allows machines to learn on their own from existing information to make predictions or/and decisions. There are three main categories of machine learning techniques: Collaborative filtering (for making recommendations), Clustering (for discovering structure in collections of data) and Classification (form of supervised learning). Machine learning helps users to make better decisions, Machine learning algorithms create patterns based on previous information and use them to design predictive models, then, use this models to obtain predictions about future data. A huge amount of data from several sources need methods and techniques to be processed correctly, in order to exploit this data efficiently, machine learning is a great technology for exploiting the needs in big data analysis. This paper describes the implementation of Apache Spark MLlib and Apache Mahout in order to process Big Data using Machine Learning algorithms. Furthermore, we conduct experimental simulations to show the difference between this two Machine Learning frameworks. Subsequently, we discuss the most striking observations that emerge from the comparison of these technologies through several experimental studies.","PeriodicalId":199959,"journal":{"name":"Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3289402.3289525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Machine learning is a field within artificial intelligence that allows machines to learn on their own from existing information to make predictions or/and decisions. There are three main categories of machine learning techniques: Collaborative filtering (for making recommendations), Clustering (for discovering structure in collections of data) and Classification (form of supervised learning). Machine learning helps users to make better decisions, Machine learning algorithms create patterns based on previous information and use them to design predictive models, then, use this models to obtain predictions about future data. A huge amount of data from several sources need methods and techniques to be processed correctly, in order to exploit this data efficiently, machine learning is a great technology for exploiting the needs in big data analysis. This paper describes the implementation of Apache Spark MLlib and Apache Mahout in order to process Big Data using Machine Learning algorithms. Furthermore, we conduct experimental simulations to show the difference between this two Machine Learning frameworks. Subsequently, we discuss the most striking observations that emerge from the comparison of these technologies through several experimental studies.