SIM-PIPE DryRunner: An approach for testing container-based big data pipelines and generating simulation data

2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC) Pub Date : 2022-06-01 DOI:10.1109/COMPSAC54236.2022.00182

Aleena Thomas, Nikolay Nikolov, Antoine Pultier, D. Roman, B. Elvesæter, A. Soylu

{"title":"SIM-PIPE DryRunner: An approach for testing container-based big data pipelines and generating simulation data","authors":"Aleena Thomas, Nikolay Nikolov, Antoine Pultier, D. Roman, B. Elvesæter, A. Soylu","doi":"10.1109/COMPSAC54236.2022.00182","DOIUrl":null,"url":null,"abstract":"Big data pipelines are becoming increasingly vital in a wide range of data intensive application domains such as digital healthcare, telecommunication, and manufacturing for efficiently processing data. Data pipelines in such domains are complex and dynamic and involve a number of data processing steps that are deployed on heterogeneous computing resources under the realm of the Edge-Cloud paradigm. The processes of testing and simulating big data pipelines on heterogeneous resources need to be able to accurately represent this complexity. However, since big data processing is heavily resource-intensive, it makes testing and simulation based on historical execution data impractical. In this paper, we introduce the SIM - PIPE Dry Runner approach - a dry run approach that deploys a big data pipeline step by step in an isolated environment and executes it with sample data; this approach could be used for testing big data pipelines and realising practical simulations using existing simulators.","PeriodicalId":330838,"journal":{"name":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC54236.2022.00182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Big data pipelines are becoming increasingly vital in a wide range of data intensive application domains such as digital healthcare, telecommunication, and manufacturing for efficiently processing data. Data pipelines in such domains are complex and dynamic and involve a number of data processing steps that are deployed on heterogeneous computing resources under the realm of the Edge-Cloud paradigm. The processes of testing and simulating big data pipelines on heterogeneous resources need to be able to accurately represent this complexity. However, since big data processing is heavily resource-intensive, it makes testing and simulation based on historical execution data impractical. In this paper, we introduce the SIM - PIPE Dry Runner approach - a dry run approach that deploys a big data pipeline step by step in an isolated environment and executes it with sample data; this approach could be used for testing big data pipelines and realising practical simulations using existing simulators.

查看原文本刊更多论文

SIM-PIPE DryRunner:一种测试基于容器的大数据管道和生成模拟数据的方法

为了高效地处理数据，大数据管道在数字医疗、电信和制造业等广泛的数据密集型应用领域变得越来越重要。这些领域中的数据管道是复杂和动态的，并且涉及许多数据处理步骤，这些步骤部署在边缘云范式领域下的异构计算资源上。在异构资源上测试和模拟大数据管道的过程需要能够准确地表示这种复杂性。然而，由于大数据处理是资源密集型的，因此基于历史执行数据的测试和模拟是不切实际的。在本文中，我们介绍了SIM - PIPE干流方法——一种在孤立环境中逐步部署大数据管道并使用样本数据执行的干流方法;这种方法可以用于测试大数据管道，并使用现有模拟器实现实际模拟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)

自引率

0.00%

发文量