{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00041-3","DOIUrl":"10.1016/S0743-7315(25)00041-3","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105074"},"PeriodicalIF":3.4,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143785399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fengyi Huang , Wenhua Wang , Jianxiong Guo , Wentao Fan , Yang Xu , Tian Wang , Jiannong Cao
{"title":"Price-aware resource management for multi-modal DNN inference in collaborative heterogeneous edge environments","authors":"Fengyi Huang , Wenhua Wang , Jianxiong Guo , Wentao Fan , Yang Xu , Tian Wang , Jiannong Cao","doi":"10.1016/j.jpdc.2025.105080","DOIUrl":"10.1016/j.jpdc.2025.105080","url":null,"abstract":"<div><div>To address the limitations of ARM64-based AI edge devices, which are energy-efficient but computationally constrained, as well as general-purpose edge servers, this paper proposes a multi-modal CollaborativeHeterogeneous Edge Computing (CHEC) architecture that achieves low latency and enhances computational capabilities. The CHEC framework, which is segmented into an edge private cloud and an edge public cloud, endeavors to optimize the profits of Edge Service Providers (ESPs) through dynamic heterogeneous resource management. In particular, it is achieved by formulating the challenge as a multi-stage Mixed-Integer Nonlinear Programming (MINLP) problem. We introduce a resource collaboration system based on resource leasing incorporating three Economic Payment Models (EPMs), ensuring efficient and profitable resource utilization. To tackle this complex issue, we develop a three-layer Hybrid Deep Reinforcement Learning (HDRL) algorithm with EPMs, HDRL-EPMs, for efficient management of dynamic and heterogeneous resources. Extensive simulations confirm the algorithm's ability to ensure convergence and approximate optimal solutions, significantly outperforming existing methods. Testbed experiments demonstrate that the CHEC architecture reduces latency by up to 21.83% in real-world applications, markedly surpassing previous approaches.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105080"},"PeriodicalIF":3.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Embedded scaffolding for teaching and assessing inquiry-based hands-on laboratory on distributed systems","authors":"Jordi Guitart","doi":"10.1016/j.jpdc.2025.105082","DOIUrl":"10.1016/j.jpdc.2025.105082","url":null,"abstract":"<div><h3>Context</h3><div>Information Technology education must cultivate proficiency on distributed systems, including strong hands-on laboratory skills, to meet the needs of the society and the industry. Given the complexity of distributed systems, any successful methodology to teach them to novice students must be scaffolded appropriately to ensure that the students acquire the required degree of expertise.</div></div><div><h3>Objective</h3><div>We propose a comprehensive scaffolding approach for inquiry-based hands-on laboratory on a distributed systems course, which guides not only the learning process, but also its assessment. The approach is based mainly on embedded scaffolds, namely explicit coding and experimental milestones and open questions with predefined grades, but also features contingent scaffolds provided by the teacher when additional assistance is needed.</div></div><div><h3>Method</h3><div>We apply the methodology in the context of the subject ‘Distributed Network Systems’ offered by our university. We compare the students' performance during three academic courses using the proposed methodology with respect to the three previous courses that were still using the former methodology. We use both visual representations and planned Analysis of Variance (ANOVA) tests to verify our hypothesis defined as a complex contrast.</div></div><div><h3>Findings</h3><div>We find that there is a statistically significant improvement in the students' performance when using the new methodology, both in their grades of the assignments (<em>F</em>(1, 75.364) = 17.770, <span><math><mi>p</mi><mo>=</mo><mn>6.85</mn><mo>×</mo><msup><mrow><mn>10</mn></mrow><mrow><mo>−</mo><mn>5</mn></mrow></msup></math></span>) and, more importantly, also in their grades of the exam questions about the practicals (<em>F</em>(1, 123.186) = 13.285, <span><math><mi>p</mi><mo>=</mo><mn>3.93</mn><mo>×</mo><msup><mrow><mn>10</mn></mrow><mrow><mo>−</mo><mn>4</mn></mrow></msup></math></span>).</div></div><div><h3>Implications</h3><div>Our results encourage other instructors to incorporate embedded scaffolds for teaching and assessing their hands-on laboratories on distributed systems.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105082"},"PeriodicalIF":3.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeyi Li , Minyao Liu , Pan Wang , Wangyu Su , Tianshui Chang , Xuejiao Chen , Xiaokang Zhou
{"title":"Multi-ARCL: Multimodal adaptive relay-based distributed continual learning for encrypted traffic classification","authors":"Zeyi Li , Minyao Liu , Pan Wang , Wangyu Su , Tianshui Chang , Xuejiao Chen , Xiaokang Zhou","doi":"10.1016/j.jpdc.2025.105083","DOIUrl":"10.1016/j.jpdc.2025.105083","url":null,"abstract":"<div><div>Encrypted Traffic Classification (ETC) using Deep Learning (DL) faces two bottlenecks: homogeneous network traffic representation and ineffective model updates. Currently, multimodal-based DL combined with the Continual Learning (CL) approaches mitigate the above problems but overlook silent applications, whose traffic is absent due to guideline violations leading developers to cease their operation and maintenance. Specifically, silent applications accelerate the decay of model stability, while new and active applications challenge model plasticity. This paper presents Multi-ARCL, a multimodal adaptive replay-based distributed CL framework for ETC. The framework prioritizes using crypto-semantic information from flows' payload and flows' statistical features to represent. Additionally, the framework proposes an adaptive relay-based continual learning method that effectively eliminates silent neurons and retrains new samples and a limited subset of old ones. Exemplars of silent applications are selectively removed during new task training. To enhance training efficiency, the framework uses distributed learning to quickly address the stability-plasticity dilemma and reduce the cost of storing silent applications. Experiments show that ARCL outperforms state-of-the-art methods, with an accuracy improvement of over 8.64% on the NJUPT2023 dataset.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105083"},"PeriodicalIF":3.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pascal Bouvry , Mats Brorsson , Ramon Canal , Aryan Eftekhari , Siegfried Höfinger , Didier Smets , Harald Köstler , Tomáš Kozubek , Ezhilmathi Krishnasamy , Josep Llosa , Alexandra Lukas-Rother , Xavier Martorell , Dirk Pleiter , Ana Proykova , Maria-Ribera Sancho , Olaf Schenk , Cristina Silvano
{"title":"The European master for HPC curriculum","authors":"Pascal Bouvry , Mats Brorsson , Ramon Canal , Aryan Eftekhari , Siegfried Höfinger , Didier Smets , Harald Köstler , Tomáš Kozubek , Ezhilmathi Krishnasamy , Josep Llosa , Alexandra Lukas-Rother , Xavier Martorell , Dirk Pleiter , Ana Proykova , Maria-Ribera Sancho , Olaf Schenk , Cristina Silvano","doi":"10.1016/j.jpdc.2025.105081","DOIUrl":"10.1016/j.jpdc.2025.105081","url":null,"abstract":"<div><div>The use of High-Performance Computing (HPC) is crucial for addressing various grand challenges. While significant investments are made in digital infrastructures that comprise HPC resources, its realisation, operation, and, in particular, its use critically depends on suitably trained experts. In this paper, we present the results of an effort to design and implement a pan-European reference curriculum for a master's degree in HPC.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105081"},"PeriodicalIF":3.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuang Li , Heshi Wang , Yanhua Wen , Qingyu Shi , Qinyu Wang , Chunhua Hu , Dongchen Wu
{"title":"STVAI: Exploring spatio-temporal similarity for scalable and efficient intelligent video inference","authors":"Chuang Li , Heshi Wang , Yanhua Wen , Qingyu Shi , Qinyu Wang , Chunhua Hu , Dongchen Wu","doi":"10.1016/j.jpdc.2025.105079","DOIUrl":"10.1016/j.jpdc.2025.105079","url":null,"abstract":"<div><div>The integration of video data computation and inference is a cornerstone for the evolution of multimodal artificial intelligence (MAI). The extensive adoption and optimization of CNN-based frameworks have significantly improved the accuracy of video inference, yet they present substantial challenges for real-time and large-scale computational demands. Existing researches primarily utilize the temporal similarity between video frames to reduce redundant computations, but most of them overlooked the spatial similarity within the frames themselves. Hence, we propose STVAI, a scalable and efficient method that leverages both spatial and temporal similarities to accelerate video inference. This approach uses a parallel region merging strategy, which maintains inference accuracy and enhances the sparsity of the computation matrix. Moreover, we have optimized the computation of sparse convolutions by utilizing Tensor Cores, which accelerate dense convolution computations based on the sparsity of the tiles. Experimental results demonstrate that STVAI achieves a stable acceleration of 1.25 times faster than cuDNN implementations, with only a 5% decrease in prediction accuracy. STVAI can achieve accelerations up to 1.53x, surpassing that of existing methods. Our method can be directly applied to various CNN architectures for video inference tasks without the need for retraining the model.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105079"},"PeriodicalIF":3.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latency and cost-aware consumer group autoscaling in message broker systems","authors":"Diogo Landau , Nishant Saurabh , Xavier Andrade , Jorge G. Barbosa","doi":"10.1016/j.jpdc.2025.105071","DOIUrl":"10.1016/j.jpdc.2025.105071","url":null,"abstract":"<div><div>Message brokers often facilitate communication between data producers and consumers by adding variable-sized messages to ordered distributed queues. Our goal is to determine the number of consumers and consumer partition assignments needed to ensure that the data consumption rate matches the data production rate. We model this problem as a variable item size bin packing problem. As the production rate varies, new consumer–partition assignments are computed, potentially requiring the reallocation of partitions from one consumer to another. During reallocation, data in the queue are not read, leading to increased latency costs. To address this problem, we focus on the multiobjective optimization cost of minimizing the number of consumers and reducing latency. We introduce several heuristic algorithms and compare them to state-of-the-art heuristics. In our experimental setup, the proposed modified worst fit (MWF) heuristic achieves a 48% reduction, with a similar number of consumers, in comparison with the best fit decrease (BFD). In addition, MWF achieves a <span><math><msup><mrow><mn>99</mn></mrow><mrow><mi>t</mi><mi>h</mi></mrow></msup></math></span> percentile latency of 2.24 seconds compared with that of 364.66 with the approach by Kafka using the same number of consumers. Alternatively, to obtain a lower <span><math><msup><mrow><mn>99</mn></mrow><mrow><mi>t</mi><mi>h</mi></mrow></msup></math></span> percentile latency than our approach does, Kafka requires at least 60% more consumers than our method requires.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105071"},"PeriodicalIF":3.4,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing the layout of embedding BCube into grid architectures","authors":"Paul Immanuel, A. Berin Greeni","doi":"10.1016/j.jpdc.2025.105070","DOIUrl":"10.1016/j.jpdc.2025.105070","url":null,"abstract":"<div><div>The storage, processing, and distribution of enormous volumes of data are made possible by data centers, which are vital components of the contemporary computing infrastructure. The BCube network is a type of significant data center network which was developed for modular data centers that are based on shipping containers. Network embedding of data centers into certain topologies offers several benefits, including improved scalability, reduced power consumption, enhanced reliability, and improved overall network performance. Embedding of a guest graph into suitable host graphs have significant applications like: virtualizing the Network-on-Chip layouts, portability of algorithms, and the simulation capabilities of parallel architectures. A crucial key factor that influences the quality of embedding is layout. So far, there have been few results regarding the embedding of graphs into certain data center networks. However, these results are obtained by fixing data center networks as host graphs with linear arrays and cycles as guest graphs. In this work, we investigate the edge isoperimetric features of BCube and embed it into linear arrays and grid structures by considering it as the guest graph. This study is the first that we are aware of, on embedding data center networks for minimum layout.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105070"},"PeriodicalIF":3.4,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143738837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The (t,k)-diagnosability of Cayley graph generated by 2-tree","authors":"Lulu Yang , Shuming Zhou , Eddie Cheng","doi":"10.1016/j.jpdc.2025.105068","DOIUrl":"10.1016/j.jpdc.2025.105068","url":null,"abstract":"<div><div>Multiprocessor systems, which typically use interconnection networks (or graphs) as underlying topologies, are widely utilized for big data analysis in scientific computing due to the advancements in technologies such as cloud computing, IoT, social network. With the dramatic expansion in the scale of multiprocessor systems, the pursuit and optimization of strategies for identifying faulty processors have become crucial to ensuring the normal operation of high-performance computing systems. System-level diagnosis is a process designed to distinguish between faulty processors and fault-free processors in multiprocessor systems. The <span><math><mo>(</mo><mi>t</mi><mo>,</mo><mi>k</mi><mo>)</mo></math></span>-diagnosis, a generalization of sequential diagnosis, proceeds to identify at least <em>k</em> faulty processors and repair them in each iteration under the assumption that there are at most <em>t</em> faulty processors whenever <span><math><mi>t</mi><mo>≥</mo><mi>k</mi></math></span>. We show that Cayley graph generated by 2-tree is <span><math><mo>(</mo><msup><mrow><mn>2</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>3</mn></mrow></msup><mo>,</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn><mo>)</mo></math></span>-diagnosable under the PMC model for <span><math><mi>n</mi><mo>≥</mo><mn>5</mn></math></span> while it is <span><math><mo>(</mo><mfrac><mrow><msup><mrow><mn>2</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>3</mn></mrow></msup><mo>(</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>6</mn><mo>)</mo></mrow><mrow><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn></mrow></mfrac><mo>,</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn><mo>)</mo></math></span>-diagnosable under the MM<sup>⁎</sup> model for <span><math><mi>n</mi><mo>≥</mo><mn>4</mn></math></span>. As an empirical case study, the <span><math><mo>(</mo><mi>t</mi><mo>,</mo><mi>k</mi><mo>)</mo></math></span>-diagnosabilities of the alternating group graph <span><math><mi>A</mi><msub><mrow><mi>G</mi></mrow><mrow><mi>n</mi></mrow></msub></math></span> under the PMC model and the MM* model have been determined.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105068"},"PeriodicalIF":3.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143687634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data quality management in big data: Strategies, tools, and educational implications","authors":"Thu Nguyen , Hong-Tri Nguyen , Tu-Anh Nguyen-Hoang","doi":"10.1016/j.jpdc.2025.105067","DOIUrl":"10.1016/j.jpdc.2025.105067","url":null,"abstract":"<div><div>This study addresses the critical need for effective Big Data Quality Management (BDQM) in education, a field where data quality has profound implications but remains underexplored. The work systematically progresses from requirement analysis and standard development to the deployment of tools for monitoring and enhancing data quality in big data workflows. The study's contributions are substantiated through five research questions that explore the impact of data quality on analytics, the establishment of evaluation standards, centralized management strategies, improvement techniques, and education-specific BDQM adaptations. By addressing these questions, the research advances both theoretical and practical frameworks, equipping stakeholders with the tools to enhance the reliability and efficiency of data-driven educational initiatives. Integrating Artificial Intelligence (AI) and distributed computing, this research introduces a novel multi-stage BDQM framework that emphasizes data quality assessment, centralized governance, and AI-enhanced improvement techniques. This work underscores the transformative potential of robust BDQM systems in supporting informed decision-making and achieving sustainable outcomes in educational projects. The survey findings highlight the potential for automated data management within big data architectures, suggesting that data quality frameworks can be significantly enhanced by leveraging AI and distributed computing. Additionally, the survey emphasizes emerging trends in big data quality management, specifically (i) automated data cleaning and cleansing and (ii) data enrichment and augmentation.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105067"},"PeriodicalIF":3.4,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143621250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}