{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00079-6","DOIUrl":"10.1016/S0743-7315(25)00079-6","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105112"},"PeriodicalIF":3.4,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144105472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tian Chen , Yu-an Tan , Thar Baker , Haokai Wu , Qiuyu Zhang , Yuanzhang Li
{"title":"ConCeal: A Winograd convolution code template for optimising GCU in parallel","authors":"Tian Chen , Yu-an Tan , Thar Baker , Haokai Wu , Qiuyu Zhang , Yuanzhang Li","doi":"10.1016/j.jpdc.2025.105108","DOIUrl":"10.1016/j.jpdc.2025.105108","url":null,"abstract":"<div><div>By minimising arithmetic operations, Winograd convolution substantially reduces the computational complexity of convolution, a pivotal operation in the training and inference stages of Convolutional Neural Networks (CNNs). This study leverages the hardware architecture and capabilities of Shanghai Enflame Technology's AI accelerator, the General Computing Unit (GCU). We develop a code template named ConCeal for Winograd convolution with 3 × 3 kernels, employing a set of interrelated optimisations, including task partitioning, memory layout design, and parallelism. These optimisations fully exploit GCU's computing resources by optimising dataflow and parallelizing the execution of tasks on GCU cores, thereby enhancing Winograd convolution. Moreover, the integrated optimisations in the template are efficiently applicable to other operators, such as max pooling. Using this template, we implement and assess the performance of four Winograd convolution operators on GCU. The experimental results showcase that Conceal operators achieve a maximum of 2.04× and an average of 1.49× speedup compared to the fastest GEMM-based convolution implementations on GCU. Additionally, the ConCeal operators demonstrate competitive or superior computing resource utilisation in certain ResNet and VGG convolution layers when compared to cuDNN on RTX2080.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105108"},"PeriodicalIF":3.4,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144114726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zdeněk Hanzálek , Ondřej Benedikt , Přemysl Šůcha , Pavel Zaykov , Michal Sojka
{"title":"Thermal modeling and optimal allocation of avionics safety-critical tasks on heterogeneous MPSoCs","authors":"Zdeněk Hanzálek , Ondřej Benedikt , Přemysl Šůcha , Pavel Zaykov , Michal Sojka","doi":"10.1016/j.jpdc.2025.105107","DOIUrl":"10.1016/j.jpdc.2025.105107","url":null,"abstract":"<div><div>Multi-Processor Systems-on-Chip (MPSoC) can deliver high performance needed in many industrial domains, including aerospace. However, their high power consumption, combined with avionics safety standards, brings new thermal management challenges. This paper investigates techniques for offline thermal-aware allocation of periodic tasks on heterogeneous MPSoCs running at a fixed clock frequency, as required in avionics. The goal is to find the assignment of tasks to (i) cores and (ii) temporal isolation windows, as required in ARINC 653 standard, while minimizing the MPSoC temperature. To achieve that, we formulate a new optimization problem, we derive its NP-hardness, and we identify its subproblem solvable in polynomial time. Furthermore, we propose and analyze three power models, and integrate them within several novel optimization approaches based on heuristics, a black-box optimizer, and Integer Linear Programming (ILP). We perform the experimental evaluation on three popular MPSoC platforms (NXP i.MX8QM MEK, NXP i.MX8QM Ixora, NVIDIA TX2) and observe a difference of up to 5.5<!--> <!-->°C among the tested methods (corresponding to a 22% reduction w.r.t. the ambient temperature). We also show that our method, integrating the empirical power model with the ILP, outperforms the other methods on all tested platforms.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105107"},"PeriodicalIF":3.4,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144114761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Román Cárdenas , Patricia Arroba , José L. Risco-Martín
{"title":"Lock-free simulation algorithm to enhance the performance of sequential and parallel DEVS simulators in shared-memory architectures","authors":"Román Cárdenas , Patricia Arroba , José L. Risco-Martín","doi":"10.1016/j.jpdc.2025.105105","DOIUrl":"10.1016/j.jpdc.2025.105105","url":null,"abstract":"<div><div>This paper presents a new algorithm for the Discrete EVent System Specification (DEVS) formalism that improves the performance of simulating complex systems by reducing the number of iterations through the model components in each simulation step. It also minimizes unnecessary visits to model components by propagating simulation routines only when necessary. Additionally, we provide two parallel versions of this new simulation algorithm that use work-stealing scheduling and avoid locking mechanisms without compromising the validity of the execution in shared-memory architectures. We implemented the proposed algorithms in the xDEVS simulator and evaluated their performance using the DEVStone synthetic benchmark. The results show that the proposed algorithms outperform state-of-the-art alternatives. For computationally intensive models, parallel implementations achieve high parallelism efficiency. Furthermore, they are more resilient to model complexity than the sequential algorithm, showing better performance for complex models even without computational overhead in state transition functions.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105105"},"PeriodicalIF":3.4,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144084012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Throughput of Byzantine Broadcast","authors":"Ruomu Hou, Haifeng Yu, Prateek Saxena","doi":"10.1016/j.jpdc.2025.105104","DOIUrl":"10.1016/j.jpdc.2025.105104","url":null,"abstract":"<div><div><em>Byzantine broadcast</em> is a classic problem in distributed computing, with a wide variety of target applications. This work is motivated by the emerging new application of byzantine broadcast in blockchains, which has prompted us to consider the <em>throughput</em> of byzantine broadcast protocols. To our knowledge, this work is the very first to investigate the throughput of byzantine broadcast. We first show that the throughput of existing byzantine broadcast protocols are all far from ideal. We then obtain a simple upper bound on the throughput of byzantine broadcast protocols, showing that no protocol can do better than this upper bound. As the central contribution of this work, we propose a novel byzantine broadcast protocol called <span>OverlayBB</span>. <span>OverlayBB</span> achieves the <em>optimal</em> throughput, and is the very first protocol that can do so. Our protocol does not sacrifice other aspects of the performance.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105104"},"PeriodicalIF":3.4,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144088996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flotilla: A scalable, modular and resilient federated learning framework for heterogeneous resources","authors":"Roopkatha Banerjee , Prince Modi , Jinal Vyas , Chunduru Sri Abhijit , Tejus Chandrashekar , Harsha Varun Marisetty , Manik Gupta , Yogesh Simmhan","doi":"10.1016/j.jpdc.2025.105103","DOIUrl":"10.1016/j.jpdc.2025.105103","url":null,"abstract":"<div><div>With the recent improvements in mobile and edge computing and rising concerns of data privacy, <em>Federated Learning (FL)</em> has rapidly gained popularity as a privacy-preserving, distributed machine learning methodology. Several FL frameworks have been built for testing novel FL strategies. However, most focus on validating the <em>learning</em> aspects of FL through pseudo-distributed simulation but not for deploying on real edge hardware in a distributed manner to meaningfully evaluate the <em>federated</em> aspects from a systems perspective. Current frameworks are also inherently not designed to support asynchronous aggregation, which is gaining popularity, and have limited resilience to client and server failures. We introduce <span>Flotilla</span>, a scalable and lightweight FL framework. It adopts a “user-first” modular design to help rapidly compose various synchronous and asynchronous FL strategies while being agnostic to the DNN architecture. It uses stateless clients and a server design that separates out the session state, which are periodically or incrementally checkpointed. We demonstrate the modularity of <span>Flotilla</span> by evaluating five different FL strategies for training five DNN models. We also evaluate the client and server-side fault tolerance on 200+ clients, and showcase its ability to rapidly failover within seconds. Finally, we show that <span>Flotilla</span>'s resource usage on Raspberry Pis and Nvidia Jetson edge accelerators are comparable to or better than three state-of-the-art FL frameworks, Flower, OpenFL and FedML. It also scales significantly better compared to Flower for 1000+ clients. This positions <span>Flotilla</span> as a competitive candidate to build novel FL strategies on, compare them uniformly, rapidly deploy them, and perform systems research and optimizations.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105103"},"PeriodicalIF":3.4,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144107305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Teaching parallel and distributed computing using data-intensive computing modules","authors":"Michael Gowanlock","doi":"10.1016/j.jpdc.2025.105093","DOIUrl":"10.1016/j.jpdc.2025.105093","url":null,"abstract":"<div><div>Parallel and distributed computing (PDC) courses are useful for computer science (CS) and domain science students. For CS students, PDC is a fundamental field that examines concepts relating to a range of CS subfields, such as algorithms, architecture, simulation, software, systems, among others. Students with domain science backgrounds also require PDC to carry out their research objectives, and the ongoing data revolution has exacerbated this necessity. Given the rise of data science and other data-enabled computational fields, we propose several data-intensive pedagogic modules that are used to teach PDC using message-passing programming with the Message Passing Interface (MPI). These modules employ activities that are interesting, relevant, and accessible to both computer and domain science students enrolled in graduate level programs.</div><div>Using pre- and post-module completion quizzes and anonymous free response surveys, we evaluated the efficacy of the pedagogic modules across four cohorts of students enrolled in a graduate level High Performance Computing (HPC) course at Northern Arizona University. The students have diverse educational backgrounds as some students were enrolled in programs outside of CS. These programs include electrical and computer engineering, mechanical engineering, astronomy & planetary science, bioinformatics, and ecoinformatics. Despite the multi-disciplinary backgrounds of the students, we find that the hands-on application-driven approach to teaching PDC was successful at helping students learn core PDC concepts, and that the modules are useful for facilitating online learning which was required during the COVID-19 pandemic.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105093"},"PeriodicalIF":3.4,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cognitive behavioural characteristics identification for remote user authentication for cybersecurity","authors":"Ahmet Orun , Emre Orun , Fatih Kurugollu","doi":"10.1016/j.jpdc.2025.105102","DOIUrl":"10.1016/j.jpdc.2025.105102","url":null,"abstract":"<div><div>Nowadays cyber-attacks keep threatening global networks and information infrastructures. Day-by-day, the threat is gradually getting more destructive and harder to counter, as the global networks continue to enlarge exponentially with limited security counter-measures. This occurrence urgently demands more sophisticated methods and techniques, such as multi-factor authentication and soft biometrics to respond to evolving threats. This paper is concerned with behavioural soft biometrics and proposes a multidisciplinary remote cognitive observation technique to meet today’s cybersecurity needs. The proposed method introduces a non-traditional “cognitive psychology” and “artificial intelligence” based approach. According to contemporary cognitive psychology research, human cognitive processes can be affected by many different personal factors and emotional states which are specific to an individual. Those factors mainly include personal perception, memory, decision-making, reasoning, learning, etc. In this study we focus on visual (graphical) perception with the support of graphical stimuli environments and investigate how such personal cognitive factors can be exploited within the cybersecurity area for remote user authentication. This technique enables remote access to the cognitive behavioural parameters of an intruder/hacker without any physical contact via online connection, disregarding the distance of the threat. The results show that cognitive stimuli provide crucial information for a behavioural user authentication system to classify the user as “authentic” or “intruder”. The ultimate goal of this work is to develop a supplementary cognitive cyber security tool for “next generation” secure online banking, finance or trade systems.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105102"},"PeriodicalIF":3.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143923475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Teaching parallel and distributed computing in a single undergraduate-level course","authors":"Tia Newhall","doi":"10.1016/j.jpdc.2025.105092","DOIUrl":"10.1016/j.jpdc.2025.105092","url":null,"abstract":"<div><div>As the application of parallel distributed computing (PDC) becomes ever more pervasive, it is increasingly important that undergraduate CS curricula expose students to a wide range of PDC topics in order to prepare them for the workforce. We present the curricular design and learning goals of an upper-level undergraduate course that covers a wide breadth of topics in parallel and distributed computing, while also providing students with depth of experience and development of problem solving, programming, and analysis skills. We discuss lessons learned from our experiences teaching this course over 15 years, and we discuss changes and improvements we have made in its offerings, as well as choices and trade-offs we made to achieve a balance between breadth and depth of coverage across these two huge fields. Evaluations from students support that our approach works well meeting the goals of exposing students to a broad range of PDC topics, building important PDC thinking and programming skills, and meeting other pedagogical goals of an advanced upper-level undergraduate CS course. Although initially designed as a single course due to constraints that are common to smaller schools, our experiences with this course lead us to conclude that it is a good approach for an advanced undergraduate course on PDC at any institution.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105092"},"PeriodicalIF":3.4,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143912437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00065-6","DOIUrl":"10.1016/S0743-7315(25)00065-6","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105098"},"PeriodicalIF":3.4,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}