An Experimental Based Study to Evaluate the Efficiency among Stream Processing Tools

Akshay Mudgal, Shaveta Bhatia
{"title":"An Experimental Based Study to Evaluate the Efficiency among Stream Processing Tools","authors":"Akshay Mudgal, Shaveta Bhatia","doi":"10.34028/iajit/20/6/11","DOIUrl":null,"url":null,"abstract":"With the advancement in internet technology, augmentation in regular data generation has been amplified at a drastic level. Several different industries, for instance hospitality, defense, railways, health care, social media, education, etc., are creating and crafting different and several types of raw and processed data at a significant level, whereas, each of them has their own unique reason to shelter and call their data imperative and crucial. Such large and huge amount of data needs some space to get saved and secured, this is what Big Data is. A Data Stream Processing Technology (DSPT) is the significant mechanism and the mainstay for compiling and computing the large amount of data as well as the way to collect and process the raw data to call it information. There are varieties of DSPT like Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas.ti, Cassandra, etc. This paper aims at comparing the five well- known and widely used open source big data DSPT (i.e., Apache Spark, Flink, Kafka, Storm, and Samza). An extensive comparison will be performed based on 12 different yet interconnected standards. A matrix has been designed through which five different experiments were executed, based on which the juxtaposition will be prepared. This paper summarizes an extensive study of open source big data DPST with a practical experimental approach in a well-controlled and sophisticated environment","PeriodicalId":161392,"journal":{"name":"The International Arab Journal of Information Technology","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Arab Journal of Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34028/iajit/20/6/11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the advancement in internet technology, augmentation in regular data generation has been amplified at a drastic level. Several different industries, for instance hospitality, defense, railways, health care, social media, education, etc., are creating and crafting different and several types of raw and processed data at a significant level, whereas, each of them has their own unique reason to shelter and call their data imperative and crucial. Such large and huge amount of data needs some space to get saved and secured, this is what Big Data is. A Data Stream Processing Technology (DSPT) is the significant mechanism and the mainstay for compiling and computing the large amount of data as well as the way to collect and process the raw data to call it information. There are varieties of DSPT like Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas.ti, Cassandra, etc. This paper aims at comparing the five well- known and widely used open source big data DSPT (i.e., Apache Spark, Flink, Kafka, Storm, and Samza). An extensive comparison will be performed based on 12 different yet interconnected standards. A matrix has been designed through which five different experiments were executed, based on which the juxtaposition will be prepared. This paper summarizes an extensive study of open source big data DPST with a practical experimental approach in a well-controlled and sophisticated environment
基于实验的流处理工具效率评估研究
随着互联网技术的进步,常规数据生成的增强已经在一个剧烈的水平上被放大。几个不同的行业,例如酒店、国防、铁路、医疗保健、社交媒体、教育等,都在很大程度上创造和制作不同类型的原始和处理过的数据,然而,每个行业都有自己独特的理由来保护和称他们的数据是必要的和至关重要的。如此庞大的数据需要一定的空间来保存和保护,这就是大数据。数据流处理技术(Data Stream Processing Technology, DSPT)是对大量数据进行编译和计算的重要机制和支柱,也是对原始数据进行采集和处理,将其称为信息的方法。DSPT有多种,如Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas。ti, Cassandra等。本文旨在比较五大知名且广泛使用的开源大数据DSPT(即Apache Spark、Flink、Kafka、Storm和Samza)。将根据12个不同但相互关联的标准进行广泛的比较。设计了一个矩阵,通过它执行了五个不同的实验,并置将在此基础上准备。本文总结了开源大数据DPST的广泛研究,并在一个良好控制和复杂的环境中采用了实际的实验方法
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信