Journal of Parallel and Distributed Computing最新文献_第3页

Power, energy, and performance analysis of single- and multi-threaded applications in the ARM ThunderX2 ARM ThunderX2中单线程和多线程应用程序的功耗、能源和性能分析

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2025-06-02 DOI: 10.1016/j.jpdc.2025.105118

Ibai Calero, Salvador Petit, María E. Gómez, Julio Sahuquillo

{"title":"Power, energy, and performance analysis of single- and multi-threaded applications in the ARM ThunderX2","authors":"Ibai Calero, Salvador Petit, María E. Gómez, Julio Sahuquillo","doi":"10.1016/j.jpdc.2025.105118","DOIUrl":"10.1016/j.jpdc.2025.105118","url":null,"abstract":"<div><div>Energy efficiency has been a major concern in data centers, and the problem is exacerbated as its size continues to rise. However, the lack of tools to measure and handle this energy at a fine granularity (e.g., processor core or last-level cache) has translated into slow research advances in this topic. Understanding where (i.e., which components) and when (the point in time) energy consumption translates into minor performance improvements is of paramount importance to design any energy-aware scheduler. This paper characterizes the relationship between energy consumption and performance in a 28-core ARM ThunderX2 processor for both single-threaded and multi-threaded applications.</div><div>This paper shows that single-threaded applications with high CPU activity maintain their performance in spite of the inter-application interference at shared resources, but this comes at the expense of higher power consumption. Conversely, applications that heavily utilize the L3 cache and memory consume less power but suffer significant performance degradation as interference levels rise.</div><div>In contrast, multi-threaded applications show two distinct behaviors. On the one hand, some of them experience significant performance gains when they execute in a higher number of cores with more threads, which outweighs the increase in power consumption, leading to high energy efficiency.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105118"},"PeriodicalIF":3.4,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144242749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页（每期）/特刊扉页（每期）

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2025-05-21 DOI: 10.1016/S0743-7315(25)00079-6

引用次数: 0

ConCeal: A Winograd convolution code template for optimising GCU in parallel 一个Winograd卷积代码模板，用于并行优化GCU

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2025-05-21 DOI: 10.1016/j.jpdc.2025.105108

Tian Chen , Yu-an Tan , Thar Baker , Haokai Wu , Qiuyu Zhang , Yuanzhang Li

{"title":"ConCeal: A Winograd convolution code template for optimising GCU in parallel","authors":"Tian Chen , Yu-an Tan , Thar Baker , Haokai Wu , Qiuyu Zhang , Yuanzhang Li","doi":"10.1016/j.jpdc.2025.105108","DOIUrl":"10.1016/j.jpdc.2025.105108","url":null,"abstract":"<div><div>By minimising arithmetic operations, Winograd convolution substantially reduces the computational complexity of convolution, a pivotal operation in the training and inference stages of Convolutional Neural Networks (CNNs). This study leverages the hardware architecture and capabilities of Shanghai Enflame Technology's AI accelerator, the General Computing Unit (GCU). We develop a code template named ConCeal for Winograd convolution with 3 × 3 kernels, employing a set of interrelated optimisations, including task partitioning, memory layout design, and parallelism. These optimisations fully exploit GCU's computing resources by optimising dataflow and parallelizing the execution of tasks on GCU cores, thereby enhancing Winograd convolution. Moreover, the integrated optimisations in the template are efficiently applicable to other operators, such as max pooling. Using this template, we implement and assess the performance of four Winograd convolution operators on GCU. The experimental results showcase that Conceal operators achieve a maximum of 2.04× and an average of 1.49× speedup compared to the fastest GEMM-based convolution implementations on GCU. Additionally, the ConCeal operators demonstrate competitive or superior computing resource utilisation in certain ResNet and VGG convolution layers when compared to cuDNN on RTX2080.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105108"},"PeriodicalIF":3.4,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144114726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Thermal modeling and optimal allocation of avionics safety-critical tasks on heterogeneous MPSoCs 异构mpsoc上航空电子安全关键任务的热建模和优化分配

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2025-05-20 DOI: 10.1016/j.jpdc.2025.105107

Zdeněk Hanzálek , Ondřej Benedikt , Přemysl Šůcha , Pavel Zaykov , Michal Sojka

{"title":"Thermal modeling and optimal allocation of avionics safety-critical tasks on heterogeneous MPSoCs","authors":"Zdeněk Hanzálek , Ondřej Benedikt , Přemysl Šůcha , Pavel Zaykov , Michal Sojka","doi":"10.1016/j.jpdc.2025.105107","DOIUrl":"10.1016/j.jpdc.2025.105107","url":null,"abstract":"<div><div>Multi-Processor Systems-on-Chip (MPSoC) can deliver high performance needed in many industrial domains, including aerospace. However, their high power consumption, combined with avionics safety standards, brings new thermal management challenges. This paper investigates techniques for offline thermal-aware allocation of periodic tasks on heterogeneous MPSoCs running at a fixed clock frequency, as required in avionics. The goal is to find the assignment of tasks to (i) cores and (ii) temporal isolation windows, as required in ARINC 653 standard, while minimizing the MPSoC temperature. To achieve that, we formulate a new optimization problem, we derive its NP-hardness, and we identify its subproblem solvable in polynomial time. Furthermore, we propose and analyze three power models, and integrate them within several novel optimization approaches based on heuristics, a black-box optimizer, and Integer Linear Programming (ILP). We perform the experimental evaluation on three popular MPSoC platforms (NXP i.MX8QM MEK, NXP i.MX8QM Ixora, NVIDIA TX2) and observe a difference of up to 5.5°C among the tested methods (corresponding to a 22% reduction w.r.t. the ambient temperature). We also show that our method, integrating the empirical power model with the ILP, outperforms the other methods on all tested platforms.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105107"},"PeriodicalIF":3.4,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144114761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal scheduling algorithms for software-defined radio pipelined and replicated task chains on multicore architectures 多核架构下软件定义无线电流水线和复制任务链的优化调度算法

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2025-05-16 DOI: 10.1016/j.jpdc.2025.105106

Diane Orhan , Laércio Lima Pilla , Denis Barthou , Adrien Cassagne , Olivier Aumage , Romain Tajan , Christophe Jégo , Camille Leroux

{"title":"Optimal scheduling algorithms for software-defined radio pipelined and replicated task chains on multicore architectures","authors":"Diane Orhan , Laércio Lima Pilla , Denis Barthou , Adrien Cassagne , Olivier Aumage , Romain Tajan , Christophe Jégo , Camille Leroux","doi":"10.1016/j.jpdc.2025.105106","DOIUrl":"10.1016/j.jpdc.2025.105106","url":null,"abstract":"<div><div>Software-Defined Radio (SDR) represents a move from dedicated hardware to software implementations of digital communication standards. This approach offers flexibility, shorter time to market, maintainability, and lower costs, but it requires an optimized distribution tasks in order to meet performance requirements. Thus, we study the problem of scheduling SDR linear task chains of stateless and stateful tasks for streaming processing. We model this problem as a pipelined workflow scheduling problem based on pipelined and replicated parallelism on homogeneous resources. We propose an optimal dynamic programming solution and an optimal greedy algorithm named OTAC for maximizing throughput while also minimizing resource utilization. Moreover, the optimality of the proposed scheduling algorithm is proved. We evaluate our solutions and compare their execution times and schedules to other algorithms using synthetic task chains and an implementation of the DVB-S2 communication standard on the AFF3CT SDR Domain Specific Language. Our results demonstrate how OTAC quickly finds optimal schedules, leading consistently to better results than other algorithms, or equivalent results with fewer resources.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105106"},"PeriodicalIF":3.4,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144195960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lock-free simulation algorithm to enhance the performance of sequential and parallel DEVS simulators in shared-memory architectures 无锁仿真算法在共享内存架构下提高顺序和并行DEVS仿真器的性能

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2025-05-15 DOI: 10.1016/j.jpdc.2025.105105

Román Cárdenas , Patricia Arroba , José L. Risco-Martín

{"title":"Lock-free simulation algorithm to enhance the performance of sequential and parallel DEVS simulators in shared-memory architectures","authors":"Román Cárdenas , Patricia Arroba , José L. Risco-Martín","doi":"10.1016/j.jpdc.2025.105105","DOIUrl":"10.1016/j.jpdc.2025.105105","url":null,"abstract":"<div><div>This paper presents a new algorithm for the Discrete EVent System Specification (DEVS) formalism that improves the performance of simulating complex systems by reducing the number of iterations through the model components in each simulation step. It also minimizes unnecessary visits to model components by propagating simulation routines only when necessary. Additionally, we provide two parallel versions of this new simulation algorithm that use work-stealing scheduling and avoid locking mechanisms without compromising the validity of the execution in shared-memory architectures. We implemented the proposed algorithms in the xDEVS simulator and evaluated their performance using the DEVStone synthetic benchmark. The results show that the proposed algorithms outperform state-of-the-art alternatives. For computationally intensive models, parallel implementations achieve high parallelism efficiency. Furthermore, they are more resilient to model complexity than the sequential algorithm, showing better performance for complex models even without computational overhead in state transition functions.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105105"},"PeriodicalIF":3.4,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144084012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Throughput of Byzantine Broadcast 拜占庭广播的吞吐量

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2025-05-15 DOI: 10.1016/j.jpdc.2025.105104

Ruomu Hou, Haifeng Yu, Prateek Saxena

引用次数: 0

Flotilla: A scalable, modular and resilient federated learning framework for heterogeneous resources Flotilla：针对异构资源的可伸缩、模块化和弹性的联邦学习框架

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2025-05-14 DOI: 10.1016/j.jpdc.2025.105103

Roopkatha Banerjee , Prince Modi , Jinal Vyas , Chunduru Sri Abhijit , Tejus Chandrashekar , Harsha Varun Marisetty , Manik Gupta , Yogesh Simmhan

{"title":"Flotilla: A scalable, modular and resilient federated learning framework for heterogeneous resources","authors":"Roopkatha Banerjee , Prince Modi , Jinal Vyas , Chunduru Sri Abhijit , Tejus Chandrashekar , Harsha Varun Marisetty , Manik Gupta , Yogesh Simmhan","doi":"10.1016/j.jpdc.2025.105103","DOIUrl":"10.1016/j.jpdc.2025.105103","url":null,"abstract":"<div><div>With the recent improvements in mobile and edge computing and rising concerns of data privacy, <em>Federated Learning (FL)</em> has rapidly gained popularity as a privacy-preserving, distributed machine learning methodology. Several FL frameworks have been built for testing novel FL strategies. However, most focus on validating the <em>learning</em> aspects of FL through pseudo-distributed simulation but not for deploying on real edge hardware in a distributed manner to meaningfully evaluate the <em>federated</em> aspects from a systems perspective. Current frameworks are also inherently not designed to support asynchronous aggregation, which is gaining popularity, and have limited resilience to client and server failures. We introduce <span>Flotilla</span>, a scalable and lightweight FL framework. It adopts a “user-first” modular design to help rapidly compose various synchronous and asynchronous FL strategies while being agnostic to the DNN architecture. It uses stateless clients and a server design that separates out the session state, which are periodically or incrementally checkpointed. We demonstrate the modularity of <span>Flotilla</span> by evaluating five different FL strategies for training five DNN models. We also evaluate the client and server-side fault tolerance on 200+ clients, and showcase its ability to rapidly failover within seconds. Finally, we show that <span>Flotilla</span>'s resource usage on Raspberry Pis and Nvidia Jetson edge accelerators are comparable to or better than three state-of-the-art FL frameworks, Flower, OpenFL and FedML. It also scales significantly better compared to Flower for 1000+ clients. This positions <span>Flotilla</span> as a competitive candidate to build novel FL strategies on, compare them uniformly, rapidly deploy them, and perform systems research and optimizations.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"203 ","pages":"Article 105103"},"PeriodicalIF":3.4,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144107305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Teaching parallel and distributed computing using data-intensive computing modules 使用数据密集型计算模块教授并行和分布式计算

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2025-05-12 DOI: 10.1016/j.jpdc.2025.105093

Michael Gowanlock

{"title":"Teaching parallel and distributed computing using data-intensive computing modules","authors":"Michael Gowanlock","doi":"10.1016/j.jpdc.2025.105093","DOIUrl":"10.1016/j.jpdc.2025.105093","url":null,"abstract":"<div><div>Parallel and distributed computing (PDC) courses are useful for computer science (CS) and domain science students. For CS students, PDC is a fundamental field that examines concepts relating to a range of CS subfields, such as algorithms, architecture, simulation, software, systems, among others. Students with domain science backgrounds also require PDC to carry out their research objectives, and the ongoing data revolution has exacerbated this necessity. Given the rise of data science and other data-enabled computational fields, we propose several data-intensive pedagogic modules that are used to teach PDC using message-passing programming with the Message Passing Interface (MPI). These modules employ activities that are interesting, relevant, and accessible to both computer and domain science students enrolled in graduate level programs.</div><div>Using pre- and post-module completion quizzes and anonymous free response surveys, we evaluated the efficacy of the pedagogic modules across four cohorts of students enrolled in a graduate level High Performance Computing (HPC) course at Northern Arizona University. The students have diverse educational backgrounds as some students were enrolled in programs outside of CS. These programs include electrical and computer engineering, mechanical engineering, astronomy & planetary science, bioinformatics, and ecoinformatics. Despite the multi-disciplinary backgrounds of the students, we find that the hands-on application-driven approach to teaching PDC was successful at helping students learn core PDC concepts, and that the modules are useful for facilitating online learning which was required during the COVID-19 pandemic.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105093"},"PeriodicalIF":3.4,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cognitive behavioural characteristics identification for remote user authentication for cybersecurity 面向网络安全的远程用户认证认知行为特征识别

IF 3.4 3区计算机科学

Journal of Parallel and Distributed Computing Pub Date : 2025-05-01 DOI: 10.1016/j.jpdc.2025.105102

Ahmet Orun , Emre Orun , Fatih Kurugollu

{"title":"Cognitive behavioural characteristics identification for remote user authentication for cybersecurity","authors":"Ahmet Orun , Emre Orun , Fatih Kurugollu","doi":"10.1016/j.jpdc.2025.105102","DOIUrl":"10.1016/j.jpdc.2025.105102","url":null,"abstract":"<div><div>Nowadays cyber-attacks keep threatening global networks and information infrastructures. Day-by-day, the threat is gradually getting more destructive and harder to counter, as the global networks continue to enlarge exponentially with limited security counter-measures. This occurrence urgently demands more sophisticated methods and techniques, such as multi-factor authentication and soft biometrics to respond to evolving threats. This paper is concerned with behavioural soft biometrics and proposes a multidisciplinary remote cognitive observation technique to meet today’s cybersecurity needs. The proposed method introduces a non-traditional “cognitive psychology” and “artificial intelligence” based approach. According to contemporary cognitive psychology research, human cognitive processes can be affected by many different personal factors and emotional states which are specific to an individual. Those factors mainly include personal perception, memory, decision-making, reasoning, learning, etc. In this study we focus on visual (graphical) perception with the support of graphical stimuli environments and investigate how such personal cognitive factors can be exploited within the cybersecurity area for remote user authentication. This technique enables remote access to the cognitive behavioural parameters of an intruder/hacker without any physical contact via online connection, disregarding the distance of the threat. The results show that cognitive stimuli provide crucial information for a behavioural user authentication system to classify the user as “authentic” or “intruder”. The ultimate goal of this work is to develop a supplementary cognitive cyber security tool for “next generation” secure online banking, finance or trade systems.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105102"},"PeriodicalIF":3.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143923475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0