Big Data Analytics Using Machine Learning Techniques for Prediction on Datasets

Computational Intelligence and Machine Learning Pub Date : 2023-04-14 DOI:10.36647/ciml/04.01.a002

Ankit Verma

{"title":"Big Data Analytics Using Machine Learning Techniques for Prediction on Datasets","authors":"Ankit Verma","doi":"10.36647/ciml/04.01.a002","DOIUrl":null,"url":null,"abstract":"Data analytics is the process of performing scientific and statistical analysis on raw data in order to transform it into information that can be used for gaining knowledge. A recently emerging trend in feature abstraction is the combination of computational techniques and big data analysis. This requires gaining knowledge from trustworthy data sources, being able to digest information quickly, and making accurate predictions about the future. The primary objective of this study is to locate the machine learning strategies that produce the most accurate prediction by utilising the model that has been proposed. The supervised and unsupervised strategies have been implemented in a variety of different ways using the MapReduce methodology; however, the suggested model makes use of the Apache Spark framework in order to compare the many existing methods. In this study, the emphasis is placed on elucidating the characteristics of datasets in order to conduct the most accurate analysis possible using machine learning techniques. For the purpose of conducting an analysis of the data sets, machine learning methods such as linear regression, decision trees, random forests, and gradient boosting tree algorithms are utilised. In light of the findings of this research, it is possible to draw the conclusion that when the Spark framework is applied on top of Machine Learning methods, the efficiency of the model is improved by a factor of seventy percent in comparison to the MapReduce paradigm. Keyword : Apache Spark Framework, Big Data Analytics, Machine Learning Algorithms, MapReduce Paradigm.","PeriodicalId":203221,"journal":{"name":"Computational Intelligence and Machine Learning","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence and Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36647/ciml/04.01.a002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Data analytics is the process of performing scientific and statistical analysis on raw data in order to transform it into information that can be used for gaining knowledge. A recently emerging trend in feature abstraction is the combination of computational techniques and big data analysis. This requires gaining knowledge from trustworthy data sources, being able to digest information quickly, and making accurate predictions about the future. The primary objective of this study is to locate the machine learning strategies that produce the most accurate prediction by utilising the model that has been proposed. The supervised and unsupervised strategies have been implemented in a variety of different ways using the MapReduce methodology; however, the suggested model makes use of the Apache Spark framework in order to compare the many existing methods. In this study, the emphasis is placed on elucidating the characteristics of datasets in order to conduct the most accurate analysis possible using machine learning techniques. For the purpose of conducting an analysis of the data sets, machine learning methods such as linear regression, decision trees, random forests, and gradient boosting tree algorithms are utilised. In light of the findings of this research, it is possible to draw the conclusion that when the Spark framework is applied on top of Machine Learning methods, the efficiency of the model is improved by a factor of seventy percent in comparison to the MapReduce paradigm. Keyword : Apache Spark Framework, Big Data Analytics, Machine Learning Algorithms, MapReduce Paradigm.

查看原文本刊更多论文

使用机器学习技术预测数据集的大数据分析

数据分析是对原始数据进行科学和统计分析的过程，目的是将其转化为可用于获取知识的信息。最近出现的特征抽象趋势是计算技术与大数据分析的结合。这需要从可靠的数据源获取知识，能够快速消化信息，并对未来做出准确的预测。本研究的主要目标是通过利用所提出的模型来定位产生最准确预测的机器学习策略。有监督和无监督策略已经使用MapReduce方法以各种不同的方式实现;但是，建议的模型使用Apache Spark框架来比较许多现有的方法。在本研究中，重点放在阐明数据集的特征，以便使用机器学习技术进行最准确的分析。为了对数据集进行分析，使用了线性回归、决策树、随机森林和梯度增强树算法等机器学习方法。根据这项研究的结果，可以得出这样的结论:当Spark框架应用于机器学习方法之上时，与MapReduce范式相比，模型的效率提高了70%。关键词:Apache Spark框架，大数据分析，机器学习算法，MapReduce范式

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Intelligence and Machine Learning

自引率

0.00%

发文量