基于实验的流处理工具效率评估研究

The International Arab Journal of Information Technology Pub Date : 2023-01-01 DOI:10.34028/iajit/20/6/11

Akshay Mudgal, Shaveta Bhatia

{"title":"基于实验的流处理工具效率评估研究","authors":"Akshay Mudgal, Shaveta Bhatia","doi":"10.34028/iajit/20/6/11","DOIUrl":null,"url":null,"abstract":"With the advancement in internet technology, augmentation in regular data generation has been amplified at a drastic level. Several different industries, for instance hospitality, defense, railways, health care, social media, education, etc., are creating and crafting different and several types of raw and processed data at a significant level, whereas, each of them has their own unique reason to shelter and call their data imperative and crucial. Such large and huge amount of data needs some space to get saved and secured, this is what Big Data is. A Data Stream Processing Technology (DSPT) is the significant mechanism and the mainstay for compiling and computing the large amount of data as well as the way to collect and process the raw data to call it information. There are varieties of DSPT like Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas.ti, Cassandra, etc. This paper aims at comparing the five well- known and widely used open source big data DSPT (i.e., Apache Spark, Flink, Kafka, Storm, and Samza). An extensive comparison will be performed based on 12 different yet interconnected standards. A matrix has been designed through which five different experiments were executed, based on which the juxtaposition will be prepared. This paper summarizes an extensive study of open source big data DPST with a practical experimental approach in a well-controlled and sophisticated environment","PeriodicalId":161392,"journal":{"name":"The International Arab Journal of Information Technology","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Experimental Based Study to Evaluate the Efficiency among Stream Processing Tools\",\"authors\":\"Akshay Mudgal, Shaveta Bhatia\",\"doi\":\"10.34028/iajit/20/6/11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the advancement in internet technology, augmentation in regular data generation has been amplified at a drastic level. Several different industries, for instance hospitality, defense, railways, health care, social media, education, etc., are creating and crafting different and several types of raw and processed data at a significant level, whereas, each of them has their own unique reason to shelter and call their data imperative and crucial. Such large and huge amount of data needs some space to get saved and secured, this is what Big Data is. A Data Stream Processing Technology (DSPT) is the significant mechanism and the mainstay for compiling and computing the large amount of data as well as the way to collect and process the raw data to call it information. There are varieties of DSPT like Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas.ti, Cassandra, etc. This paper aims at comparing the five well- known and widely used open source big data DSPT (i.e., Apache Spark, Flink, Kafka, Storm, and Samza). An extensive comparison will be performed based on 12 different yet interconnected standards. A matrix has been designed through which five different experiments were executed, based on which the juxtaposition will be prepared. This paper summarizes an extensive study of open source big data DPST with a practical experimental approach in a well-controlled and sophisticated environment\",\"PeriodicalId\":161392,\"journal\":{\"name\":\"The International Arab Journal of Information Technology\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The International Arab Journal of Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34028/iajit/20/6/11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Arab Journal of Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34028/iajit/20/6/11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着互联网技术的进步，常规数据生成的增强已经在一个剧烈的水平上被放大。几个不同的行业，例如酒店、国防、铁路、医疗保健、社交媒体、教育等，都在很大程度上创造和制作不同类型的原始和处理过的数据，然而，每个行业都有自己独特的理由来保护和称他们的数据是必要的和至关重要的。如此庞大的数据需要一定的空间来保存和保护，这就是大数据。数据流处理技术(Data Stream Processing Technology, DSPT)是对大量数据进行编译和计算的重要机制和支柱，也是对原始数据进行采集和处理，将其称为信息的方法。DSPT有多种，如Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas。ti, Cassandra等。本文旨在比较五大知名且广泛使用的开源大数据DSPT(即Apache Spark、Flink、Kafka、Storm和Samza)。将根据12个不同但相互关联的标准进行广泛的比较。设计了一个矩阵，通过它执行了五个不同的实验，并置将在此基础上准备。本文总结了开源大数据DPST的广泛研究，并在一个良好控制和复杂的环境中采用了实际的实验方法

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Experimental Based Study to Evaluate the Efficiency among Stream Processing Tools

With the advancement in internet technology, augmentation in regular data generation has been amplified at a drastic level. Several different industries, for instance hospitality, defense, railways, health care, social media, education, etc., are creating and crafting different and several types of raw and processed data at a significant level, whereas, each of them has their own unique reason to shelter and call their data imperative and crucial. Such large and huge amount of data needs some space to get saved and secured, this is what Big Data is. A Data Stream Processing Technology (DSPT) is the significant mechanism and the mainstay for compiling and computing the large amount of data as well as the way to collect and process the raw data to call it information. There are varieties of DSPT like Apache Spark, Flink, Kafka, Storm, Samza, Hadoop, Atlas.ti, Cassandra, etc. This paper aims at comparing the five well- known and widely used open source big data DSPT (i.e., Apache Spark, Flink, Kafka, Storm, and Samza). An extensive comparison will be performed based on 12 different yet interconnected standards. A matrix has been designed through which five different experiments were executed, based on which the juxtaposition will be prepared. This paper summarizes an extensive study of open source big data DPST with a practical experimental approach in a well-controlled and sophisticated environment

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The International Arab Journal of Information Technology

自引率

0.00%

发文量