Mengbing Zhou;Qiuyan Li;Mingyuan Cai;Chengzhong Xu;Yang Wang
{"title":"Towards Hybrid Architectures for Big Data Analytics: Insights From Spark-MPI Integration","authors":"Mengbing Zhou;Qiuyan Li;Mingyuan Cai;Chengzhong Xu;Yang Wang","doi":"10.1109/TSC.2025.3562342","DOIUrl":null,"url":null,"abstract":"High-Performance Data Analytics (HPDA) combines high-performance computing (HPC) with data analytics to uncover patterns and insights in dual-intensive applications that are both data-intensive and compute-intensive. Traditional Big Data frameworks and HPC technologies often struggle to address these demands independently, prompting researchers to explore their integration. Spark, known for its efficient in-memory computing with RDDs, and MPI, a foundational standard in HPC, are prominent candidates for such integration. This survey explores the integration of Spark and MPI for HPDA, highlighting their potential for unified data processing and computation. We first classify application workloads and review the characteristics and limitations of traditional frameworks. Then, we analyze the challenges and requirements of integrated architectures, focusing on the specific implementations of typical middleware-level architectures. Through comparative analysis, we highlight their advantages and limitations. Finally, we present application examples, outline key challenges and future research directions, and briefly discuss progress in integration approaches for other technology combinations.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"18 3","pages":"1852-1868"},"PeriodicalIF":5.8000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Services Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10970102/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
High-Performance Data Analytics (HPDA) combines high-performance computing (HPC) with data analytics to uncover patterns and insights in dual-intensive applications that are both data-intensive and compute-intensive. Traditional Big Data frameworks and HPC technologies often struggle to address these demands independently, prompting researchers to explore their integration. Spark, known for its efficient in-memory computing with RDDs, and MPI, a foundational standard in HPC, are prominent candidates for such integration. This survey explores the integration of Spark and MPI for HPDA, highlighting their potential for unified data processing and computation. We first classify application workloads and review the characteristics and limitations of traditional frameworks. Then, we analyze the challenges and requirements of integrated architectures, focusing on the specific implementations of typical middleware-level architectures. Through comparative analysis, we highlight their advantages and limitations. Finally, we present application examples, outline key challenges and future research directions, and briefly discuss progress in integration approaches for other technology combinations.
期刊介绍:
IEEE Transactions on Services Computing encompasses the computing and software aspects of the science and technology of services innovation research and development. It places emphasis on algorithmic, mathematical, statistical, and computational methods central to services computing. Topics covered include Service Oriented Architecture, Web Services, Business Process Integration, Solution Performance Management, and Services Operations and Management. The transactions address mathematical foundations, security, privacy, agreement, contract, discovery, negotiation, collaboration, and quality of service for web services. It also covers areas like composite web service creation, business and scientific applications, standards, utility models, business process modeling, integration, collaboration, and more in the realm of Services Computing.