Towards Hybrid Architectures for Big Data Analytics: Insights From Spark-MPI Integration

IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Mengbing Zhou;Qiuyan Li;Mingyuan Cai;Chengzhong Xu;Yang Wang
{"title":"Towards Hybrid Architectures for Big Data Analytics: Insights From Spark-MPI Integration","authors":"Mengbing Zhou;Qiuyan Li;Mingyuan Cai;Chengzhong Xu;Yang Wang","doi":"10.1109/TSC.2025.3562342","DOIUrl":null,"url":null,"abstract":"High-Performance Data Analytics (HPDA) combines high-performance computing (HPC) with data analytics to uncover patterns and insights in dual-intensive applications that are both data-intensive and compute-intensive. Traditional Big Data frameworks and HPC technologies often struggle to address these demands independently, prompting researchers to explore their integration. Spark, known for its efficient in-memory computing with RDDs, and MPI, a foundational standard in HPC, are prominent candidates for such integration. This survey explores the integration of Spark and MPI for HPDA, highlighting their potential for unified data processing and computation. We first classify application workloads and review the characteristics and limitations of traditional frameworks. Then, we analyze the challenges and requirements of integrated architectures, focusing on the specific implementations of typical middleware-level architectures. Through comparative analysis, we highlight their advantages and limitations. Finally, we present application examples, outline key challenges and future research directions, and briefly discuss progress in integration approaches for other technology combinations.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"18 3","pages":"1852-1868"},"PeriodicalIF":5.8000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Services Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10970102/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

High-Performance Data Analytics (HPDA) combines high-performance computing (HPC) with data analytics to uncover patterns and insights in dual-intensive applications that are both data-intensive and compute-intensive. Traditional Big Data frameworks and HPC technologies often struggle to address these demands independently, prompting researchers to explore their integration. Spark, known for its efficient in-memory computing with RDDs, and MPI, a foundational standard in HPC, are prominent candidates for such integration. This survey explores the integration of Spark and MPI for HPDA, highlighting their potential for unified data processing and computation. We first classify application workloads and review the characteristics and limitations of traditional frameworks. Then, we analyze the challenges and requirements of integrated architectures, focusing on the specific implementations of typical middleware-level architectures. Through comparative analysis, we highlight their advantages and limitations. Finally, we present application examples, outline key challenges and future research directions, and briefly discuss progress in integration approaches for other technology combinations.
迈向大数据分析的混合架构:来自Spark-MPI集成的见解
高性能数据分析(HPDA)将高性能计算(HPC)与数据分析相结合,以发现数据密集型和计算密集型双密集型应用程序中的模式和见解。传统的大数据框架和高性能计算技术往往难以独立解决这些需求,这促使研究人员探索它们的集成。Spark以其使用rdd的高效内存计算而闻名,而MPI是HPC的基础标准,它们是这种集成的突出候选者。本调查探讨了HPDA中Spark和MPI的集成,强调了它们在统一数据处理和计算方面的潜力。我们首先对应用程序工作负载进行分类,并回顾传统框架的特点和局限性。然后,我们分析了集成体系结构的挑战和需求,重点讨论了典型中间件级体系结构的具体实现。通过比较分析,我们突出了它们的优势和局限性。最后,我们给出了应用实例,概述了主要挑战和未来的研究方向,并简要讨论了其他技术组合集成方法的进展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Services Computing
IEEE Transactions on Services Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
11.50
自引率
6.20%
发文量
278
审稿时长
>12 weeks
期刊介绍: IEEE Transactions on Services Computing encompasses the computing and software aspects of the science and technology of services innovation research and development. It places emphasis on algorithmic, mathematical, statistical, and computational methods central to services computing. Topics covered include Service Oriented Architecture, Web Services, Business Process Integration, Solution Performance Management, and Services Operations and Management. The transactions address mathematical foundations, security, privacy, agreement, contract, discovery, negotiation, collaboration, and quality of service for web services. It also covers areas like composite web service creation, business and scientific applications, standards, utility models, business process modeling, integration, collaboration, and more in the realm of Services Computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信