Giuseppe Antonio Di Luna , Paola Flocchini , Giuseppe Prencipe , Nicola Santoro
{"title":"Locating a black hole in a dynamic ring","authors":"Giuseppe Antonio Di Luna , Paola Flocchini , Giuseppe Prencipe , Nicola Santoro","doi":"10.1016/j.jpdc.2024.104998","DOIUrl":"10.1016/j.jpdc.2024.104998","url":null,"abstract":"<div><div>In networked environments supporting mobile agents, a pressing problem is the presence of network sites harmful for the agents. In this paper we consider the danger posed by a node that destroys any incoming agent without leaving any trace. Such a dangerous node is known in the literature as a <em>black hole</em> (<span>Bh</span>). The problem of a team of system agents determining its location, known as <em>black hole search</em> (<span>Bhs</span> ), has been extensively studied in the literature under a variety of assumptions, both in synchronous and asynchronous settings. The main complexity parameter of <span>Bhs</span> <!-->is the number of system agents (called <em>size</em>) needed to solve the problem; other parameters are the number of moves (called <em>cost</em>) performed by the agents, and the <em>time</em> until termination.</div><div>In the existing literature, with only a couple of exceptions, all results are based on a common assumption that the network is <em>static</em>, i.e. its topology does not change in time. We consider instead the <span>Bhs</span> <!-->when the network is <em>dynamic</em>: the link structure of the graph changes over time. While time-varying graphs have been the focus of intense research in the last two decades, very little is known on the problem of locating the <span>Bh</span> in such networks.</div><div>In this paper, we contribute to fill this research gap by studying <span>Bhs</span> <!-->in <em>dynamic ring</em> networks, focusing on the <em>1-interval connectivity</em> adversarial dynamics. Feasibility and complexity of the problem depend on many factors, specifically on the size <em>n</em> of the ring, whether or not <em>n</em> is known, and the type of inter-agent communication (whiteboards, tokens, face-to-face, visual). In this paper, we provide a <em>complete</em> feasibility characterization presenting size optimal algorithms. Furthermore, we establish lower bounds on the cost and time of size-optimal solutions and show that our algorithms achieve those bounds.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104998"},"PeriodicalIF":3.4,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingjie Song , Zhuo Tang , Yaohua Wang , Xiong Xiao , Zhizhong Liu , Jing Xia , Kenli Li
{"title":"OASR-WFBP: An overlapping aware start-up sharing gradient merging strategy for efficient communication in distributed deep learning","authors":"Yingjie Song , Zhuo Tang , Yaohua Wang , Xiong Xiao , Zhizhong Liu , Jing Xia , Kenli Li","doi":"10.1016/j.jpdc.2024.104997","DOIUrl":"10.1016/j.jpdc.2024.104997","url":null,"abstract":"<div><div>Wait-Free-Back-Propagation (WFBP) is a practical method for distributed deep-learning, but it suffers from a high communication overhead. To address this issue, the communication overhead can be reduced by overlapping gradient communication and computation, and sharing the startup time among multiple gradient communication phases. However, existing optimizations choose to share the startup time greedily and fail to coordinately exploit the overlapping opportunity between computation and communication. We propose an overlapping aware startup sharing Wait-Free-Back-Propagation (OASR-WFBP). An analytic model is designed to guide the sharing procedure. Evaluations show that OASR-WFBP achieves a 5%-16% optimization in iteration time over the state-of-the-art WFBP algorithm.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104997"},"PeriodicalIF":3.4,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-speed turbulent flows towards the exascale: STREAmS-2 porting and performance","authors":"Srikanth Sathyanarayana , Matteo Bernardini , Davide Modesti , Sergio Pirozzoli , Francesco Salvadore","doi":"10.1016/j.jpdc.2024.104993","DOIUrl":"10.1016/j.jpdc.2024.104993","url":null,"abstract":"<div><div>Exascale High Performance Computing (HPC) represents a tremendous opportunity to push the boundaries of Computational Fluid Dynamics (CFD), but despite the consolidated trend towards the use of Graphics Processing Units (GPUs), programmability is still an issue. STREAmS-2 (Bernardini et al. Comput. Phys. Commun. 285 (2023) 108644) is a compressible solver for canonical wall-bounded turbulent flows capable of harvesting the potential of NVIDIA GPUs. Here we extend the already available CUDA Fortran backend with a novel HIP backend targeting AMD GPU architectures. The main implementation strategies are discussed along with a novel Python tool that can generate the HIP and CPU code versions allowing developers to focus their attention only on the CUDA Fortran backend. Single GPU performance is analyzed focusing on NVIDIA A100 and AMD MI250x cards which are currently at the core of several HPC clusters. The gap between peak GPU performance and STREAmS-2 performance is found to be generally smaller for NVIDIA cards. Roofline analysis allows tracing this behavior to unexpectedly different computational intensities of the same kernel using the two cards. Additional single-GPU comparisons are performed to assess the impact of grid size, number of parallelized loops, thread masking and thread divergence. Parallel performance is measured on the two largest EuroHPC pre-exascale systems, LUMI (AMD GPUs) and Leonardo (NVIDIA GPUs). Strong scalability reveals more than 80% efficiency up to 16 nodes for Leonardo and up to 32 for LUMI. Weak scalability shows an impressive efficiency of over 95% up to the maximum number of nodes tested (256 for LUMI and 512 for Leonardo). This analysis shows that STREAmS-2 is the perfect candidate to fully exploit the power of current pre-exascale HPC systems in Europe, allowing users to simulate flows with over a trillion mesh points, thus reducing the gap between the Reynolds numbers achievable in high-fidelity simulations and those of real engineering applications.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104993"},"PeriodicalIF":3.4,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorenzo Petrosino , Luigi Masi , Federico D'Antoni , Mario Merone , Luca Vollero
{"title":"A zero-knowledge proof federated learning on DLT for healthcare data","authors":"Lorenzo Petrosino , Luigi Masi , Federico D'Antoni , Mario Merone , Luca Vollero","doi":"10.1016/j.jpdc.2024.104992","DOIUrl":"10.1016/j.jpdc.2024.104992","url":null,"abstract":"<div><div>With the increasingly widespread adoption of Healthcare 4.0 practices, new challenges have arisen for the utilization of collected sensitive data. On the one hand, these data have immense potential to unlock valuable insights for personalized medicine, early disease detection, and predictive analysis thanks to the use of Artificial Intelligence. On the other hand, ensuring the protection of patient privacy is of paramount importance to maintain trust and uphold ethical practices within the healthcare system. Classical centralized learning approaches do not fit well with the privacy and security requirements imposed by the law and the sensitivity of treated data, which is why decentralized learning approaches are gaining ground. Among these, Federated Learning (FL) stands out as a viable alternative, providing greater security and performance comparable to classic centralized learning approaches. However, there are still various attacks targeting the local parameters or gradients updated by the participants. Therefore, we present our architecture based on the conjunction of Zero-Knowledge Proof, FL, and blockchain that implements also the decentralized identifier standard. The adoption of this architecture can grant the execution, management, supervision, and updating of the FL process, guaranteeing the resilience of the system and the reliability and traceability of exchanged data. In order to test the performance, robustness, and implementation costs of the proposed architecture, we develop a case study on the prediction of blood glucose levels in people with Type-1-diabetes. The results of our analysis show an improved system in terms of balance between performance privacy and security, guaranteeing high levels of verifiability, therefore proving the proposed architecture suitable for most of the FL processes needed in the healthcare field.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104992"},"PeriodicalIF":3.4,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hui Zeng , Tongqing Zhou , Yeting Guo , Zhiping Cai , Fang Liu
{"title":"Towards value-sensitive and poisoning-proof model aggregation for federated learning on heterogeneous data","authors":"Hui Zeng , Tongqing Zhou , Yeting Guo , Zhiping Cai , Fang Liu","doi":"10.1016/j.jpdc.2024.104994","DOIUrl":"10.1016/j.jpdc.2024.104994","url":null,"abstract":"<div><div>Federated Learning (FL) enables collaborative model training without sharing data, but traditional static averaging of local updates leads to poor performance on heterogeneous data. The following remedies, either by scheduling data distribution or mitigating local discrepancies, predominately fail to handle fine-grained heterogeneity (e.g., local imbalanced labels). To commence, we reveal that static averaging leads to the global model suffering from the <em>mean fallacy</em>. That is, the averaging process favors the local model with large parameters numerically rather than knowledge. To tackle this, we introduce FedVSA, a simple-yet-effective model aggregation framework sensitive to heterogeneous local data merits. Specifically, we invent a new global loss function for FL by prioritizing the valuable local updates, facilitating efficient convergence. We deduce a softmax-based aggregation rule and prove its convergence property via rigorous theoretical analysis. Additionally, we expose poisoning threats of model replacement that utilize the <em>mean fallacy</em> for attacks. To mitigate this threat, we propose a two-step mechanism involving auditing historic local training statistics and analyzing the <em>Shapley Value</em>. Through extensive experiments, we show that FedVSA achieves faster convergence (~1.52×) and higher accuracy (~1.6%) compared to the baselines. It also effectively mitigates poisoning attacks by agilely recovering and returning to normal aggregation.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104994"},"PeriodicalIF":3.4,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BRFL: A blockchain-based byzantine-robust federated learning model","authors":"Yang Li , Chunhe Xia , Chang Li , Tianbo Wang","doi":"10.1016/j.jpdc.2024.104995","DOIUrl":"10.1016/j.jpdc.2024.104995","url":null,"abstract":"<div><div>With the increasing importance of machine learning, the privacy and security of training data have become a concern. Federated learning, which stores data in distributed nodes and shares only model parameters, has gained significant attention for addressing this concern. However, a challenge arises in federated learning due to the byzantine attack problem, where malicious local models can compromise the global model's performance during aggregation. This article proposes the <u>B</u>lockchain-based Byzantine-<u>R</u>obust <u>F</u>ederated <u>L</u>earning (BRFL) model, which combines federated learning with blockchain technology. We improve the robustness of federated learning by proposing a new consensus algorithm and aggregation algorithm for blockchain-based federated learning. Meanwhile, we modify the block saving rules of the blockchain to reduce the storage pressure of the nodes. Experimental results on public datasets demonstrate the superior byzantine robustness of our secure aggregation algorithm compared to other baseline aggregation methods, and reduce the storage pressure of the blockchain nodes.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 104995"},"PeriodicalIF":3.4,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A lightweight RDMA connection protocol based on post-hoc confirmation","authors":"Ke Wu, Dezun Dong, Weixia Xu","doi":"10.1016/j.jpdc.2024.104991","DOIUrl":"10.1016/j.jpdc.2024.104991","url":null,"abstract":"<div><div>With the increasing scale and complexity of high-performance computing systems, the rising failure rate poses significant challenges for RDMA networks that aim for high bandwidth and low latency. RDMA networks require hardware-level end-to-end reliable data transmission services to avoid the high cost of software failure recovery. Tianhe HPC interconnection network adopts a NIC-based RDMA reliable connection protocol, RCP. RCP establishes a connection for each message that enters the NIC and releases it after the transmission is complete. However, this introduces an additional round-trip time RTT connection overhead for each message, which severely impacts the performance of networks dominated by short messages in high-performance computing systems. We have found that utilization of receiver-side connection resources has been consistently low because maintaining message-grained connections on the NIC results in rapid release of connections. Therefore, we propose a lightweight RDMA connection protocol based on post-hoc confirmation, PCP. PCP assumes the receiver has connection resources by default and eliminates the need for confirmation from the receiver before sending a message, thus reducing the connection overhead of almost all messages by one RTT. At the same time, PCP also includes mechanisms to address the special case where the receiver lacks connection resources. Evaluation results demonstrate that PCP significantly optimizes short messages and applications dominated by short messages. Moreover, PCP further reduces the usage of receiver-side connection resources. Additionally, PCP does not experience performance degradation even under large-scale heavy loads and severe endpoint congestion.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"195 ","pages":"Article 104991"},"PeriodicalIF":3.4,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SpEpistasis: A sparse approach for three-way epistasis detection","authors":"Diogo Marques, Leonel Sousa, Aleksandar Ilic","doi":"10.1016/j.jpdc.2024.104989","DOIUrl":"10.1016/j.jpdc.2024.104989","url":null,"abstract":"<div><div>Epistasis detection is a fundamental application in the areas of bioinformatics and biomedicine, providing important insights regarding the relationship between the human genome and the occurrence of certain diseases. Exhaustive epistasis detection approaches are employed to achieve an accurate and deterministic solution, at the cost of high computational complexity, especially when targeting high-order epistasis. While recent works employ vectorization and cache-blocking techniques to alleviate this burden, these solutions are now limited by the maximum performance of the functional units of computing systems. Thus, to further improve the performance of epistasis detection it is necessary to reduce its number of memory transfers and computations. To tackle this issue, this work proposes SpEpistasis, which performs three-way epistasis detection by relying on sparse features, which by only storing the non-zero elements of the dataset, allows for reducing the number of operations needed for epistasis detection. To achieve this goal, a new hybrid format to represent the input dataset is proposed, which stores a subset of the data in the compressed sparse row format. Moreover, new sparse-aware algorithmic approaches are also proposed in order to leverage both the hybrid format and the vector capabilities of current CPUs from Intel, AMD, and ARM. The experimental results show that SpEpistasis provides a speedup up to 3.7× and average speedups of around 1.8× and 1.33× when compared with other state-of-the-art works.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"195 ","pages":"Article 104989"},"PeriodicalIF":3.4,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142323939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust and Scalable Federated Learning Framework for Client Data Heterogeneity Based on Optimal Clustering","authors":"Zihan Li , Shuai Yuan , Zhitao Guan","doi":"10.1016/j.jpdc.2024.104990","DOIUrl":"10.1016/j.jpdc.2024.104990","url":null,"abstract":"<div><div>Federated learning is a promising paradigm for applications across a variety of domains. However, there are some challenges that must be addressed in real-world scenarios, particularly the data heterogeneity among participating clients. Most existing studies primarily focus on the issue of non-independent and identically distributed data, but they do not consider the critical aspect of data quality heterogeneity. When low-quality data is contributed by some clients, the efficacy of models trained through the traditional approaches will be significantly compromised. Therefore, we propose ROSCFL, a robust and scalable federated learning framework for client data heterogeneity based on optimal clustering. We first develop a cluster contribution evaluation strategy based on the optimal clustering to quantify the contribution of each cluster. Next, we design a robust model aggregation strategy, which effectively mitigates the impact of low-quality data on the global model by optimizing weight allocation and client sampling. Finally, we introduce a client incorporation mechanism to enhance the scalability of ROSCFL. Extensive experiments have been conducted, and the results demonstrate that ROSCFL achieves strong robustness and scalability, particularly in scenarios wherein data distribution and quality heterogeneity coexist.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"195 ","pages":"Article 104990"},"PeriodicalIF":3.4,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142327797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00146-1","DOIUrl":"10.1016/S0743-7315(24)00146-1","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"194 ","pages":"Article 104982"},"PeriodicalIF":3.4,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524001461/pdfft?md5=4b65d789bc9db964e4fbb6b24c70b8aa&pid=1-s2.0-S0743731524001461-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142274618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}