SIM-PIPE DryRunner:一种测试基于容器的大数据管道和生成模拟数据的方法

Aleena Thomas, Nikolay Nikolov, Antoine Pultier, D. Roman, B. Elvesæter, A. Soylu
{"title":"SIM-PIPE DryRunner:一种测试基于容器的大数据管道和生成模拟数据的方法","authors":"Aleena Thomas, Nikolay Nikolov, Antoine Pultier, D. Roman, B. Elvesæter, A. Soylu","doi":"10.1109/COMPSAC54236.2022.00182","DOIUrl":null,"url":null,"abstract":"Big data pipelines are becoming increasingly vital in a wide range of data intensive application domains such as digital healthcare, telecommunication, and manufacturing for efficiently processing data. Data pipelines in such domains are complex and dynamic and involve a number of data processing steps that are deployed on heterogeneous computing resources under the realm of the Edge-Cloud paradigm. The processes of testing and simulating big data pipelines on heterogeneous resources need to be able to accurately represent this complexity. However, since big data processing is heavily resource-intensive, it makes testing and simulation based on historical execution data impractical. In this paper, we introduce the SIM - PIPE Dry Runner approach - a dry run approach that deploys a big data pipeline step by step in an isolated environment and executes it with sample data; this approach could be used for testing big data pipelines and realising practical simulations using existing simulators.","PeriodicalId":330838,"journal":{"name":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"SIM-PIPE DryRunner: An approach for testing container-based big data pipelines and generating simulation data\",\"authors\":\"Aleena Thomas, Nikolay Nikolov, Antoine Pultier, D. Roman, B. Elvesæter, A. Soylu\",\"doi\":\"10.1109/COMPSAC54236.2022.00182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big data pipelines are becoming increasingly vital in a wide range of data intensive application domains such as digital healthcare, telecommunication, and manufacturing for efficiently processing data. Data pipelines in such domains are complex and dynamic and involve a number of data processing steps that are deployed on heterogeneous computing resources under the realm of the Edge-Cloud paradigm. The processes of testing and simulating big data pipelines on heterogeneous resources need to be able to accurately represent this complexity. However, since big data processing is heavily resource-intensive, it makes testing and simulation based on historical execution data impractical. In this paper, we introduce the SIM - PIPE Dry Runner approach - a dry run approach that deploys a big data pipeline step by step in an isolated environment and executes it with sample data; this approach could be used for testing big data pipelines and realising practical simulations using existing simulators.\",\"PeriodicalId\":330838,\"journal\":{\"name\":\"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSAC54236.2022.00182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC54236.2022.00182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

为了高效地处理数据,大数据管道在数字医疗、电信和制造业等广泛的数据密集型应用领域变得越来越重要。这些领域中的数据管道是复杂和动态的,并且涉及许多数据处理步骤,这些步骤部署在边缘云范式领域下的异构计算资源上。在异构资源上测试和模拟大数据管道的过程需要能够准确地表示这种复杂性。然而,由于大数据处理是资源密集型的,因此基于历史执行数据的测试和模拟是不切实际的。在本文中,我们介绍了SIM - PIPE干流方法——一种在孤立环境中逐步部署大数据管道并使用样本数据执行的干流方法;这种方法可以用于测试大数据管道,并使用现有模拟器实现实际模拟。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SIM-PIPE DryRunner: An approach for testing container-based big data pipelines and generating simulation data
Big data pipelines are becoming increasingly vital in a wide range of data intensive application domains such as digital healthcare, telecommunication, and manufacturing for efficiently processing data. Data pipelines in such domains are complex and dynamic and involve a number of data processing steps that are deployed on heterogeneous computing resources under the realm of the Edge-Cloud paradigm. The processes of testing and simulating big data pipelines on heterogeneous resources need to be able to accurately represent this complexity. However, since big data processing is heavily resource-intensive, it makes testing and simulation based on historical execution data impractical. In this paper, we introduce the SIM - PIPE Dry Runner approach - a dry run approach that deploys a big data pipeline step by step in an isolated environment and executes it with sample data; this approach could be used for testing big data pipelines and realising practical simulations using existing simulators.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信