Journal of Parallel and Distributed Computing最新文献

筛选
英文 中文
Efficient GPU-accelerated parallel cross-correlation
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-02-12 DOI: 10.1016/j.jpdc.2025.105054
Karel Maděra, Adam Šmelko, Martin Kruliš
{"title":"Efficient GPU-accelerated parallel cross-correlation","authors":"Karel Maděra,&nbsp;Adam Šmelko,&nbsp;Martin Kruliš","doi":"10.1016/j.jpdc.2025.105054","DOIUrl":"10.1016/j.jpdc.2025.105054","url":null,"abstract":"<div><div>Cross-correlation is a data analysis method widely employed in various signal processing and similarity-search applications. Our objective is to design a highly optimized GPU-accelerated implementation that will speed up the applications and also improve energy efficiency since GPUs are more efficient than CPUs in data-parallel tasks. There are two rudimentary ways to compute cross-correlation — a definition-based algorithm that tries all possible overlaps and an algorithm based on the Fourier transform, which is much more complex but has better asymptotical time complexity. We have focused mainly on the definition-based approach which is better suited for smaller input data and we have implemented multiple CUDA-enabled algorithms with multiple optimization options. The algorithms were evaluated on various scenarios, including the most typical types of multi-signal correlations, and we provide empirically verified optimal solutions for each of the studied scenarios.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105054"},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPU memory usage optimization for backward propagation in deep network training
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-02-11 DOI: 10.1016/j.jpdc.2025.105053
Ding-Yong Hong , Tzu-Hsien Tsai , Ning Wang , Pangfeng Liu , Jan-Jan Wu
{"title":"GPU memory usage optimization for backward propagation in deep network training","authors":"Ding-Yong Hong ,&nbsp;Tzu-Hsien Tsai ,&nbsp;Ning Wang ,&nbsp;Pangfeng Liu ,&nbsp;Jan-Jan Wu","doi":"10.1016/j.jpdc.2025.105053","DOIUrl":"10.1016/j.jpdc.2025.105053","url":null,"abstract":"<div><div>In modern Deep Learning, it has been a trend to design larger Deep Neural Networks (DNNs) for the execution of more complex tasks and better accuracy. On the other hand, Convolutional Neural Networks (CNNs) have become the standard method for most of computer vision tasks. However, the memory allocation for the intermediate data in convolution layers can cause severe memory pressure during model training. Many solutions have been proposed to resolve the problem. Besides hardware-dependent solutions, a general methodology <em>rematerialization</em> can reduce GPU memory usage by trading computation for memory efficiently. The idea is to select a set of intermediate results during the forward phase as <em>checkpoints</em>, and only save them in memory to reduce memory usage. The backward phase recomputes the intermediate data from the closest checkpoints in memory as needed. This recomputation increases execution time but saves memory by not storing all intermediate results in memory during the forward phase. In this paper, we will focus on efficiently finding the optimal checkpoint subset to achieve the least peak memory usage during the model training. We first describe the theoretical background of the training of a neural network using mathematical equations. We use these equations to identify all essential data required during both forward and backward phases to compute the gradient of weights of the model. We first identify the <em>checkpoint selection</em> problem and propose a dynamic programming algorithm with time complexity <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>3</mn></mrow></msup><mo>)</mo></math></span> to solve the problem of finding the optimal checkpoint subset. With extensive experiments, we formulate a more accurate description of the problem using our theoretical analysis and revise the objective function based on the tracing, and propose an <span><math><mi>O</mi><mo>(</mo><mi>n</mi><mo>)</mo></math></span>-time algorithm for finding the optimal checkpoint subset.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105053"},"PeriodicalIF":3.4,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An introductory-level undergraduate CS course that introduces parallel computing
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-02-04 DOI: 10.1016/j.jpdc.2025.105044
Tia Newhall, Kevin C. Webb, Vasanta Chaganti, Andrew Danner
{"title":"An introductory-level undergraduate CS course that introduces parallel computing","authors":"Tia Newhall,&nbsp;Kevin C. Webb,&nbsp;Vasanta Chaganti,&nbsp;Andrew Danner","doi":"10.1016/j.jpdc.2025.105044","DOIUrl":"10.1016/j.jpdc.2025.105044","url":null,"abstract":"<div><div>We present the curricular design, pedagogy, and goals of an introductory-level course on computer systems that introduces parallel and distributed computing (PDC) to students who have only a CS1 background. With the ubiquity of multicore processors, cloud computing, and hardware accelerators, PDC topics have become fundamental knowledge areas in the undergraduate CS curriculum. As a result, it is increasingly important for students to learn a common core of introductory parallel and distributed computing topics and to develop parallel thinking skills early in their CS studies. Our introductory-level course focuses on three main curricular goals: 1) understanding how a computer runs a program, 2) evaluating system costs associated with running a program, and 3) taking advantage of the power of parallel computing. We elaborate on the goals and details of our course's key modules, and we discuss our pedagogical approach that includes active-learning techniques. We also include an evaluation of our course and a discussion of our experiences teaching it since Fall 2012. We find that the PDC foundation gained through early exposure in our course helps students gain confidence in their ability to expand and apply their understanding of PDC concepts throughout their CS education.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105044"},"PeriodicalIF":3.4,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-02-02 DOI: 10.1016/S0743-7315(25)00014-0
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00014-0","DOIUrl":"10.1016/S0743-7315(25)00014-0","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"198 ","pages":"Article 105047"},"PeriodicalIF":3.4,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DRViT: A dynamic redundancy-aware vision transformer accelerator via algorithm and architecture co-design on FPGA
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-28 DOI: 10.1016/j.jpdc.2025.105042
Xiangfeng Sun , Yuanting Zhang , Qinyu Wang , Xiaofeng Zou , Yujia Liu , Ziqian Zeng , Huiping Zhuang
{"title":"DRViT: A dynamic redundancy-aware vision transformer accelerator via algorithm and architecture co-design on FPGA","authors":"Xiangfeng Sun ,&nbsp;Yuanting Zhang ,&nbsp;Qinyu Wang ,&nbsp;Xiaofeng Zou ,&nbsp;Yujia Liu ,&nbsp;Ziqian Zeng ,&nbsp;Huiping Zhuang","doi":"10.1016/j.jpdc.2025.105042","DOIUrl":"10.1016/j.jpdc.2025.105042","url":null,"abstract":"<div><div>The multi-modal artificial intelligence (MAI) has attracted significant interest due to its capability to process and integrate data from multiple modalities, including images, text, and audio. Addressing MAI tasks in distributed systems necessitate robust and efficient architectures. The Transformer architecture has emerged as a primary network in this context. The integration of Vision Transformers (ViTs) within multimodal frameworks is crucial for enhancing the processing and comprehension of image data across diverse modalities. However, the complex architecture of ViTs and the extensive resources required for processing large-scale image data pose high computational and storage demands. These demands are particularly challenging for deploying ViTs on edge devices within distributed frameworks. To address this issue, we propose a novel dynamic redundancy-aware ViT accelerator based on parallel computing, termed DRViT. DRViT is supported by an algorithm and architecture co-design. We first propose a hardware-friendly lightweight algorithm featuring token merging, token pruning, and an INT8 quantization scheme. Then, we design a specialized architecture to support this algorithm, transforming the lightweight algorithm into significant latency and energy-efficiency improvements. Our design is implemented on the Xilinx Alveo U250, achieving an overall inference latency of 0.86 ms and 1.17 ms per image for ViT-tiny at 140 MHz and 100 MHz, respectively. The throughput can reach 1,380 GOP/s at peak, demonstrating superior performance compared to state-of-the-art accelerators, even at lower frequencies.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105042"},"PeriodicalIF":3.4,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143098545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latency-aware placement of stream processing operators in modern-day stream processing frameworks
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-27 DOI: 10.1016/j.jpdc.2025.105041
Raphael Ecker , Vasileios Karagiannis , Michael Sober , Stefan Schulte
{"title":"Latency-aware placement of stream processing operators in modern-day stream processing frameworks","authors":"Raphael Ecker ,&nbsp;Vasileios Karagiannis ,&nbsp;Michael Sober ,&nbsp;Stefan Schulte","doi":"10.1016/j.jpdc.2025.105041","DOIUrl":"10.1016/j.jpdc.2025.105041","url":null,"abstract":"<div><div>The rise of the Internet of Things has substantially increased the number of interconnected devices at the edge of the network. As a result, a large number of computations are now distributed in the compute continuum, spanning from the edge to the cloud, generating vast amounts of data. Stream processing is typically employed to process this data in near real-time due to its efficiency in handling continuous streams of information in a scalable manner. However, many stream processing approaches do not consider the underlying network devices of the compute continuum as candidate resources for processing data. Moreover, many existing works do not consider the incurred network latency of performing computations on multiple devices in a distributed way. To avoid this, we formulate an optimization problem for utilizing the complete compute continuum resources and design heuristics to solve this problem efficiently. Furthermore, we integrate our heuristics into Apache Storm and perform experiments that show latency- and throughput-related benefits compared to alternatives.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105041"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143098544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Skyward secure: Advancing drone data-sharing in 6G with decentralized dataspace and supported technologies
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-21 DOI: 10.1016/j.jpdc.2025.105040
Saeed Hamood Alsamhi , Sumit Srivastava , Mamoon Rashid , Amnnah Alhabeeb , Santosh Kumar , Navin Singh Rajput , Ammar Hawbani , Liang Zhao , Mohammed A.A. Al-qaness , Edward Curry
{"title":"Skyward secure: Advancing drone data-sharing in 6G with decentralized dataspace and supported technologies","authors":"Saeed Hamood Alsamhi ,&nbsp;Sumit Srivastava ,&nbsp;Mamoon Rashid ,&nbsp;Amnnah Alhabeeb ,&nbsp;Santosh Kumar ,&nbsp;Navin Singh Rajput ,&nbsp;Ammar Hawbani ,&nbsp;Liang Zhao ,&nbsp;Mohammed A.A. Al-qaness ,&nbsp;Edward Curry","doi":"10.1016/j.jpdc.2025.105040","DOIUrl":"10.1016/j.jpdc.2025.105040","url":null,"abstract":"<div><div>The capacity of Dataspace enables the distribution of heterogeneous data from several sources and domains and has attracted attention for resolving data integration challenges. Drone data sharing faces challenges such as protecting privacy and security, building trust and dependability, controlling latency and scalability, facilitating real-time data processing, and preserving the caliber of shared models. Therefore, sixth-generation (6G) networks provide high throughput and low latency to improve drone operations; security issues are exacerbated by the sensitive nature of shared data and the lack of centralized monitoring. To address the challenges, this paper presents a conceptual framework for a Dataspace in the Sky to enable secure and efficient drone data-sharing within 6G networks in the transition from Industry 4.0 to Industry 5.0. The Dataspace in the Sky integrates Federated Learning (FL), a decentralized Machine Learning (ML) approach that enhances security and privacy by sharing models instead of raw data, facilitating effective drone collaboration. However, the quality of shared local models often suffers due to inconsistent data contributions and unreliable recording mechanisms, which can undermine the performance of FL. To tackle the challenges, the framework employs blockchain (BC) to decentralize and secure the Dataspace, ensuring the integrity of contribution records and improving the reliability of shared models. Dataspace in the Sky empowered decentralized data sharing which addresses latency issues by decentralizing decision-making and enhances trust and reliability by leveraging immutable and transparent BC mechanisms. The robustness of Dataspace in the Sky solution is not only secures drone-sharing operations in 6G environments but enables the development of citizen-friendly mobility services, expanding opportunities across smart environments.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105040"},"PeriodicalIF":3.4,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FASNet: Federated adversarial Siamese networks for robust malware image classification
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-16 DOI: 10.1016/j.jpdc.2025.105039
Namrata Govind Ambekar , Sonali Samal , N. Nandini Devi , Surmila Thokchom
{"title":"FASNet: Federated adversarial Siamese networks for robust malware image classification","authors":"Namrata Govind Ambekar ,&nbsp;Sonali Samal ,&nbsp;N. Nandini Devi ,&nbsp;Surmila Thokchom","doi":"10.1016/j.jpdc.2025.105039","DOIUrl":"10.1016/j.jpdc.2025.105039","url":null,"abstract":"<div><div>Malware detection faces considerable challenges due to the ever-evolving and complex nature of cyber threats. Various deep learning models have demonstrated effectiveness in identifying malware within organizations. However, developing a reliable distributed malware detection model using diverse data from multiple sources faces significant challenges, which are worsened by privacy concerns, including data distribution issues and the absence of balanced datasets. This requires advanced data privacy techniques. To address this, the proposed FASNet approach makes the following key contributions: This study introduces FASNet, a novel privacy-centric distributed malware detection model designed to enhance detection accuracy and robustness. FASNet employs state-of-the-art Siamese networks as feature extractors and incorporates two significant advancements: federated learning and adversarial training. Federated learning, implemented with a client size of three, ensures that model training is conducted on individual devices, eliminating the need for centralized data collection and addressing data privacy concerns. This design also prevents data dilution and communication overhead while maintaining effective training on each device. Additionally, adversarial training utilizing the Fast Gradient Sign Method (FGSM) generates adversarial images to strengthen the model's resilience. By training on both original and adversarial malware images, FASNet improves its ability to accurately classify malware images that have been intentionally perturbed to mislead the system. Experimental results on the Blended dataset demonstrate the efficacy of the proposed FASNet approach, achieving notable performance with a testing accuracy of 0.9510, precision of 0.9417, recall of 0.9510, f1 score of 0.9384, Matthews Correlation Coefficient (MCC) of 0.9464, Jaccard Index (JI) of 0.9271 and Fowlkes-Mallows Index (FMI) of 0.9725. These experimental findings show that the proposed FASNet method effectively tackles two main challenges: privacy-centric malware detection and an imbalanced dataset.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"198 ","pages":"Article 105039"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed robust multitask clustering in wireless sensor networks using Multi-Factorial Evolutionary Algorithm
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-16 DOI: 10.1016/j.jpdc.2025.105038
Anita Panwar, Satyasai Jagannath Nanda
{"title":"Distributed robust multitask clustering in wireless sensor networks using Multi-Factorial Evolutionary Algorithm","authors":"Anita Panwar,&nbsp;Satyasai Jagannath Nanda","doi":"10.1016/j.jpdc.2025.105038","DOIUrl":"10.1016/j.jpdc.2025.105038","url":null,"abstract":"<div><div>When data collected at the local nodes of a wireless sensor network (WSN) are volumetric in nature, there is a need for local processing, then distributed clustering plays an important role. Traditional clustering algorithms based on K-means, K-medoid are not effective in these scenarios for accurate data segregation. Further, there is a requirement of techniques that can effectively handle outliers and noise present in the sensed data. Thus, there is a need to design robust distributed data clustering algorithms. Multi-Task Optimization (MTO) has taken the attention of researchers in the last couple of years after the introduction of Multi-Factorial Evolutionary Algorithm (MFEA). The MFEA can handle several single objective tasks usually related to one another and share implicit knowledge or abilities common to them. In this manuscript, the MFEA is employed to solve two tasks: 1) outlier detection and 2) perform distributed clustering at the nodes of WSN. The resultant algorithm, termed as Distributed MFEA (DMFEA), effectively removes noise and segregates data present at multiple nodes of WSN. Simulation study reveals the superior performance of DMFEA over benchmark algorithms like distributed versions of K-means, particle swarm optimization, and moth-flame optimization on two synthetic and six real-life datasets based on forest fire monitoring, air pollution indexing, Intel laboratory environment sensing, agriculture soil quality labeling, river water quality analysis, and land mine detection. The superior performance of DMFEA is demonstrated based on the Silhouette Index of obtained clusters and the percentage of outliers detected. Additionally, the DMFEA average rank in Kruskal Wallis test, is better over the three comparative algorithms.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"198 ","pages":"Article 105038"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GreediRIS: Scalable influence maximization using distributed streaming maximum cover
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-14 DOI: 10.1016/j.jpdc.2025.105037
Reet Barik , Wade Cappa , S.M. Ferdous , Marco Minutoli , Mahantesh Halappanavar , Ananth Kalyanaraman
{"title":"GreediRIS: Scalable influence maximization using distributed streaming maximum cover","authors":"Reet Barik ,&nbsp;Wade Cappa ,&nbsp;S.M. Ferdous ,&nbsp;Marco Minutoli ,&nbsp;Mahantesh Halappanavar ,&nbsp;Ananth Kalyanaraman","doi":"10.1016/j.jpdc.2025.105037","DOIUrl":"10.1016/j.jpdc.2025.105037","url":null,"abstract":"<div><div>Influence maximization—the problem of identifying a subset of <em>k</em> influential seeds (vertices) in a network—is a classical problem in network science with numerous applications. The problem is NP-hard, but there exist efficient polynomial time approximations. However, scaling these algorithms still remain a daunting task due to the complexities associated with steps involving stochastic sampling and large-scale aggregations. In this paper, we present a new parallel distributed approximation algorithm for influence maximization with provable approximation guarantees. Our approach, which we call <span>GreediRIS</span>, leverages the <span>RandGreedi</span> framework—a state-of-the-art approach for distributed submodular optimization—for solving a step that computes a maximum <em>k</em> cover. <span>GreediRIS</span> combines distributed and streaming models of computations, along with pruning techniques, to effectively address the communication bottlenecks of the algorithm. Experimental results on up to 512 nodes (32K cores) of the NERSC Perlmutter supercomputer show that <span>GreediRIS</span> can achieve good strong scaling performance, preserve quality, and significantly outperform the other state-of-the-art distributed implementations. For instance, on 512 nodes, the most performant variant of <span>GreediRIS</span> achieves geometric mean speedups of 28.99× and 36.35× for two different diffusion models, over a state-of-the-art parallel implementation. We also present a communication-optimized version of <span>GreediRIS</span> that further improves the speedups by two orders of magnitude.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"198 ","pages":"Article 105037"},"PeriodicalIF":3.4,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信