Journal of Parallel and Distributed Computing最新文献_第10页

OASR-WFBP: An overlapping aware start-up sharing gradient merging strategy for efficient communication in distributed deep learning OASR-WFBP：分布式深度学习中高效通信的重叠感知启动共享梯度合并策略

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2024-10-17 DOI: 10.1016/j.jpdc.2024.104997

Yingjie Song , Zhuo Tang , Yaohua Wang , Xiong Xiao , Zhizhong Liu , Jing Xia , Kenli Li

引用次数: 0

High-speed turbulent flows towards the exascale: STREAmS-2 porting and performance 迈向超大规模的高速湍流：STREAmS-2 移植与性能

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2024-10-15 DOI: 10.1016/j.jpdc.2024.104993

Srikanth Sathyanarayana , Matteo Bernardini , Davide Modesti , Sergio Pirozzoli , Francesco Salvadore

{"title":"High-speed turbulent flows towards the exascale: STREAmS-2 porting and performance","authors":"Srikanth Sathyanarayana , Matteo Bernardini , Davide Modesti , Sergio Pirozzoli , Francesco Salvadore","doi":"10.1016/j.jpdc.2024.104993","DOIUrl":"10.1016/j.jpdc.2024.104993","url":null,"abstract":"<div><div>Exascale High Performance Computing (HPC) represents a tremendous opportunity to push the boundaries of Computational Fluid Dynamics (CFD), but despite the consolidated trend towards the use of Graphics Processing Units (GPUs), programmability is still an issue. STREAmS-2 (Bernardini et al. Comput. Phys. Commun. 285 (2023) 108644) is a compressible solver for canonical wall-bounded turbulent flows capable of harvesting the potential of NVIDIA GPUs. Here we extend the already available CUDA Fortran backend with a novel HIP backend targeting AMD GPU architectures. The main implementation strategies are discussed along with a novel Python tool that can generate the HIP and CPU code versions allowing developers to focus their attention only on the CUDA Fortran backend. Single GPU performance is analyzed focusing on NVIDIA A100 and AMD MI250x cards which are currently at the core of several HPC clusters. The gap between peak GPU performance and STREAmS-2 performance is found to be generally smaller for NVIDIA cards. Roofline analysis allows tracing this behavior to unexpectedly different computational intensities of the same kernel using the two cards. Additional single-GPU comparisons are performed to assess the impact of grid size, number of parallelized loops, thread masking and thread divergence. Parallel performance is measured on the two largest EuroHPC pre-exascale systems, LUMI (AMD GPUs) and Leonardo (NVIDIA GPUs). Strong scalability reveals more than 80% efficiency up to 16 nodes for Leonardo and up to 32 for LUMI. Weak scalability shows an impressive efficiency of over 95% up to the maximum number of nodes tested (256 for LUMI and 512 for Leonardo). This analysis shows that STREAmS-2 is the perfect candidate to fully exploit the power of current pre-exascale HPC systems in Europe, allowing users to simulate flows with over a trillion mesh points, thus reducing the gap between the Reynolds numbers achievable in high-fidelity simulations and those of real engineering applications.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104993"},"PeriodicalIF":3.4,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A zero-knowledge proof federated learning on DLT for healthcare data 针对医疗保健数据的零知识证明联合学习 DLT

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2024-10-11 DOI: 10.1016/j.jpdc.2024.104992

Lorenzo Petrosino , Luigi Masi , Federico D'Antoni , Mario Merone , Luca Vollero

{"title":"A zero-knowledge proof federated learning on DLT for healthcare data","authors":"Lorenzo Petrosino , Luigi Masi , Federico D'Antoni , Mario Merone , Luca Vollero","doi":"10.1016/j.jpdc.2024.104992","DOIUrl":"10.1016/j.jpdc.2024.104992","url":null,"abstract":"<div><div>With the increasingly widespread adoption of Healthcare 4.0 practices, new challenges have arisen for the utilization of collected sensitive data. On the one hand, these data have immense potential to unlock valuable insights for personalized medicine, early disease detection, and predictive analysis thanks to the use of Artificial Intelligence. On the other hand, ensuring the protection of patient privacy is of paramount importance to maintain trust and uphold ethical practices within the healthcare system. Classical centralized learning approaches do not fit well with the privacy and security requirements imposed by the law and the sensitivity of treated data, which is why decentralized learning approaches are gaining ground. Among these, Federated Learning (FL) stands out as a viable alternative, providing greater security and performance comparable to classic centralized learning approaches. However, there are still various attacks targeting the local parameters or gradients updated by the participants. Therefore, we present our architecture based on the conjunction of Zero-Knowledge Proof, FL, and blockchain that implements also the decentralized identifier standard. The adoption of this architecture can grant the execution, management, supervision, and updating of the FL process, guaranteeing the resilience of the system and the reliability and traceability of exchanged data. In order to test the performance, robustness, and implementation costs of the proposed architecture, we develop a case study on the prediction of blood glucose levels in people with Type-1-diabetes. The results of our analysis show an improved system in terms of balance between performance privacy and security, guaranteeing high levels of verifiability, therefore proving the proposed architecture suitable for most of the FL processes needed in the healthcare field.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104992"},"PeriodicalIF":3.4,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards value-sensitive and poisoning-proof model aggregation for federated learning on heterogeneous data 为异构数据联合学习实现价值敏感和防中毒的模型聚合

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2024-10-11 DOI: 10.1016/j.jpdc.2024.104994

Hui Zeng , Tongqing Zhou , Yeting Guo , Zhiping Cai , Fang Liu

{"title":"Towards value-sensitive and poisoning-proof model aggregation for federated learning on heterogeneous data","authors":"Hui Zeng , Tongqing Zhou , Yeting Guo , Zhiping Cai , Fang Liu","doi":"10.1016/j.jpdc.2024.104994","DOIUrl":"10.1016/j.jpdc.2024.104994","url":null,"abstract":"<div><div>Federated Learning (FL) enables collaborative model training without sharing data, but traditional static averaging of local updates leads to poor performance on heterogeneous data. The following remedies, either by scheduling data distribution or mitigating local discrepancies, predominately fail to handle fine-grained heterogeneity (e.g., local imbalanced labels). To commence, we reveal that static averaging leads to the global model suffering from the <em>mean fallacy</em>. That is, the averaging process favors the local model with large parameters numerically rather than knowledge. To tackle this, we introduce FedVSA, a simple-yet-effective model aggregation framework sensitive to heterogeneous local data merits. Specifically, we invent a new global loss function for FL by prioritizing the valuable local updates, facilitating efficient convergence. We deduce a softmax-based aggregation rule and prove its convergence property via rigorous theoretical analysis. Additionally, we expose poisoning threats of model replacement that utilize the <em>mean fallacy</em> for attacks. To mitigate this threat, we propose a two-step mechanism involving auditing historic local training statistics and analyzing the <em>Shapley Value</em>. Through extensive experiments, we show that FedVSA achieves faster convergence (~1.52×) and higher accuracy (~1.6%) compared to the baselines. It also effectively mitigates poisoning attacks by agilely recovering and returning to normal aggregation.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104994"},"PeriodicalIF":3.4,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BRFL: A blockchain-based byzantine-robust federated learning model BRFL：基于区块链的拜占庭式稳健联合学习模型

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2024-10-10 DOI: 10.1016/j.jpdc.2024.104995

Yang Li , Chunhe Xia , Chang Li , Tianbo Wang

{"title":"BRFL: A blockchain-based byzantine-robust federated learning model","authors":"Yang Li , Chunhe Xia , Chang Li , Tianbo Wang","doi":"10.1016/j.jpdc.2024.104995","DOIUrl":"10.1016/j.jpdc.2024.104995","url":null,"abstract":"<div><div>With the increasing importance of machine learning, the privacy and security of training data have become a concern. Federated learning, which stores data in distributed nodes and shares only model parameters, has gained significant attention for addressing this concern. However, a challenge arises in federated learning due to the byzantine attack problem, where malicious local models can compromise the global model's performance during aggregation. This article proposes the <u>B</u>lockchain-based Byzantine-<u>R</u>obust <u>F</u>ederated <u>L</u>earning (BRFL) model, which combines federated learning with blockchain technology. We improve the robustness of federated learning by proposing a new consensus algorithm and aggregation algorithm for blockchain-based federated learning. Meanwhile, we modify the block saving rules of the blockchain to reduce the storage pressure of the nodes. Experimental results on public datasets demonstrate the superior byzantine robustness of our secure aggregation algorithm compared to other baseline aggregation methods, and reduce the storage pressure of the blockchain nodes.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104995"},"PeriodicalIF":3.4,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A lightweight RDMA connection protocol based on post-hoc confirmation 基于事后确认的轻量级 RDMA 连接协议

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2024-10-01 DOI: 10.1016/j.jpdc.2024.104991

Ke Wu, Dezun Dong, Weixia Xu

{"title":"A lightweight RDMA connection protocol based on post-hoc confirmation","authors":"Ke Wu, Dezun Dong, Weixia Xu","doi":"10.1016/j.jpdc.2024.104991","DOIUrl":"10.1016/j.jpdc.2024.104991","url":null,"abstract":"<div><div>With the increasing scale and complexity of high-performance computing systems, the rising failure rate poses significant challenges for RDMA networks that aim for high bandwidth and low latency. RDMA networks require hardware-level end-to-end reliable data transmission services to avoid the high cost of software failure recovery. Tianhe HPC interconnection network adopts a NIC-based RDMA reliable connection protocol, RCP. RCP establishes a connection for each message that enters the NIC and releases it after the transmission is complete. However, this introduces an additional round-trip time RTT connection overhead for each message, which severely impacts the performance of networks dominated by short messages in high-performance computing systems. We have found that utilization of receiver-side connection resources has been consistently low because maintaining message-grained connections on the NIC results in rapid release of connections. Therefore, we propose a lightweight RDMA connection protocol based on post-hoc confirmation, PCP. PCP assumes the receiver has connection resources by default and eliminates the need for confirmation from the receiver before sending a message, thus reducing the connection overhead of almost all messages by one RTT. At the same time, PCP also includes mechanisms to address the special case where the receiver lacks connection resources. Evaluation results demonstrate that PCP significantly optimizes short messages and applications dominated by short messages. Moreover, PCP further reduces the usage of receiver-side connection resources. Additionally, PCP does not experience performance degradation even under large-scale heavy loads and severe endpoint congestion.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"195 ","pages":"Article 104991"},"PeriodicalIF":3.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SpEpistasis: A sparse approach for three-way epistasis detection SpEpistasis：检测三向外显率的稀疏方法

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2024-09-23 DOI: 10.1016/j.jpdc.2024.104989

Diogo Marques, Leonel Sousa, Aleksandar Ilic

{"title":"SpEpistasis: A sparse approach for three-way epistasis detection","authors":"Diogo Marques, Leonel Sousa, Aleksandar Ilic","doi":"10.1016/j.jpdc.2024.104989","DOIUrl":"10.1016/j.jpdc.2024.104989","url":null,"abstract":"<div><div>Epistasis detection is a fundamental application in the areas of bioinformatics and biomedicine, providing important insights regarding the relationship between the human genome and the occurrence of certain diseases. Exhaustive epistasis detection approaches are employed to achieve an accurate and deterministic solution, at the cost of high computational complexity, especially when targeting high-order epistasis. While recent works employ vectorization and cache-blocking techniques to alleviate this burden, these solutions are now limited by the maximum performance of the functional units of computing systems. Thus, to further improve the performance of epistasis detection it is necessary to reduce its number of memory transfers and computations. To tackle this issue, this work proposes SpEpistasis, which performs three-way epistasis detection by relying on sparse features, which by only storing the non-zero elements of the dataset, allows for reducing the number of operations needed for epistasis detection. To achieve this goal, a new hybrid format to represent the input dataset is proposed, which stores a subset of the data in the compressed sparse row format. Moreover, new sparse-aware algorithmic approaches are also proposed in order to leverage both the hybrid format and the vector capabilities of current CPUs from Intel, AMD, and ARM. The experimental results show that SpEpistasis provides a speedup up to 3.7× and average speedups of around 1.8× and 1.33× when compared with other state-of-the-art works.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"195 ","pages":"Article 104989"},"PeriodicalIF":3.4,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142323939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust and Scalable Federated Learning Framework for Client Data Heterogeneity Based on Optimal Clustering 基于最优聚类的稳健且可扩展的客户端数据异构联合学习框架

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2024-09-22 DOI: 10.1016/j.jpdc.2024.104990

Zihan Li , Shuai Yuan , Zhitao Guan

{"title":"Robust and Scalable Federated Learning Framework for Client Data Heterogeneity Based on Optimal Clustering","authors":"Zihan Li , Shuai Yuan , Zhitao Guan","doi":"10.1016/j.jpdc.2024.104990","DOIUrl":"10.1016/j.jpdc.2024.104990","url":null,"abstract":"<div><div>Federated learning is a promising paradigm for applications across a variety of domains. However, there are some challenges that must be addressed in real-world scenarios, particularly the data heterogeneity among participating clients. Most existing studies primarily focus on the issue of non-independent and identically distributed data, but they do not consider the critical aspect of data quality heterogeneity. When low-quality data is contributed by some clients, the efficacy of models trained through the traditional approaches will be significantly compromised. Therefore, we propose ROSCFL, a robust and scalable federated learning framework for client data heterogeneity based on optimal clustering. We first develop a cluster contribution evaluation strategy based on the optimal clustering to quantify the contribution of each cluster. Next, we design a robust model aggregation strategy, which effectively mitigates the impact of low-quality data on the global model by optimizing weight allocation and client sampling. Finally, we introduce a client incorporation mechanism to enhance the scalability of ROSCFL. Extensive experiments have been conducted, and the results demonstrate that ROSCFL achieves strong robustness and scalability, particularly in scenarios wherein data distribution and quality heterogeneity coexist.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"195 ","pages":"Article 104990"},"PeriodicalIF":3.4,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142327797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面 1 - 完整扉页（常规期刊）/特刊扉页（特刊）

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2024-09-19 DOI: 10.1016/S0743-7315(24)00146-1

引用次数: 0

Survey of federated learning in intrusion detection 入侵检测中的联合学习调查

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2024-09-18 DOI: 10.1016/j.jpdc.2024.104976

Hao Zhang , Junwei Ye , Wei Huang , Ximeng Liu , Jason Gu

{"title":"Survey of federated learning in intrusion detection","authors":"Hao Zhang , Junwei Ye , Wei Huang , Ximeng Liu , Jason Gu","doi":"10.1016/j.jpdc.2024.104976","DOIUrl":"10.1016/j.jpdc.2024.104976","url":null,"abstract":"<div><p>Intrusion detection methods are crucial means to mitigate network security issues. However, the challenges posed by large-scale complex network environments include local information islands, regional privacy leaks, communication burdens, difficulties in handling heterogeneous data, and storage resource bottlenecks. Federated learning has the potential to address these challenges by leveraging widely distributed and heterogeneous data, achieving load balancing of storage and computing resources across multiple nodes, and reducing the risks of privacy leaks and bandwidth resource demands. This paper reviews the process of constructing federated learning based intrusion detection system from the perspective of intrusion detection. Specifically, it outlines six main aspects: application scenario analysis, federated learning methods, privacy and security protection, selection of classification models, data sources and client data distribution, and evaluation metrics, establishing them as key research content. Subsequently, six research topics are extracted based on these aspects. These topics include expanding application scenarios, enhancing aggregation algorithm, enhancing security, enhancing classification models, personalizing model and utilizing unlabeled data. Furthermore, the paper delves into research content related to each of these topics through in-depth investigation and analysis. Finally, the paper discusses the current challenges faced by research, and suggests promising directions for future exploration.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"195 ","pages":"Article 104976"},"PeriodicalIF":3.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0