Zeyi Li , Minyao Liu , Pan Wang , Wangyu Su , Tianshui Chang , Xuejiao Chen , Xiaokang Zhou
{"title":"Multi-ARCL: Multimodal adaptive relay-based distributed continual learning for encrypted traffic classification","authors":"Zeyi Li , Minyao Liu , Pan Wang , Wangyu Su , Tianshui Chang , Xuejiao Chen , Xiaokang Zhou","doi":"10.1016/j.jpdc.2025.105083","DOIUrl":"10.1016/j.jpdc.2025.105083","url":null,"abstract":"<div><div>Encrypted Traffic Classification (ETC) using Deep Learning (DL) faces two bottlenecks: homogeneous network traffic representation and ineffective model updates. Currently, multimodal-based DL combined with the Continual Learning (CL) approaches mitigate the above problems but overlook silent applications, whose traffic is absent due to guideline violations leading developers to cease their operation and maintenance. Specifically, silent applications accelerate the decay of model stability, while new and active applications challenge model plasticity. This paper presents Multi-ARCL, a multimodal adaptive replay-based distributed CL framework for ETC. The framework prioritizes using crypto-semantic information from flows' payload and flows' statistical features to represent. Additionally, the framework proposes an adaptive relay-based continual learning method that effectively eliminates silent neurons and retrains new samples and a limited subset of old ones. Exemplars of silent applications are selectively removed during new task training. To enhance training efficiency, the framework uses distributed learning to quickly address the stability-plasticity dilemma and reduce the cost of storing silent applications. Experiments show that ARCL outperforms state-of-the-art methods, with an accuracy improvement of over 8.64% on the NJUPT2023 dataset.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105083"},"PeriodicalIF":3.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pascal Bouvry , Mats Brorsson , Ramon Canal , Aryan Eftekhari , Siegfried Höfinger , Didier Smets , Harald Köstler , Tomáš Kozubek , Ezhilmathi Krishnasamy , Josep Llosa , Alexandra Lukas-Rother , Xavier Martorell , Dirk Pleiter , Ana Proykova , Maria-Ribera Sancho , Olaf Schenk , Cristina Silvano
{"title":"The European master for HPC curriculum","authors":"Pascal Bouvry , Mats Brorsson , Ramon Canal , Aryan Eftekhari , Siegfried Höfinger , Didier Smets , Harald Köstler , Tomáš Kozubek , Ezhilmathi Krishnasamy , Josep Llosa , Alexandra Lukas-Rother , Xavier Martorell , Dirk Pleiter , Ana Proykova , Maria-Ribera Sancho , Olaf Schenk , Cristina Silvano","doi":"10.1016/j.jpdc.2025.105081","DOIUrl":"10.1016/j.jpdc.2025.105081","url":null,"abstract":"<div><div>The use of High-Performance Computing (HPC) is crucial for addressing various grand challenges. While significant investments are made in digital infrastructures that comprise HPC resources, its realisation, operation, and, in particular, its use critically depends on suitably trained experts. In this paper, we present the results of an effort to design and implement a pan-European reference curriculum for a master's degree in HPC.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105081"},"PeriodicalIF":3.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuang Li , Heshi Wang , Yanhua Wen , Qingyu Shi , Qinyu Wang , Chunhua Hu , Dongchen Wu
{"title":"STVAI: Exploring spatio-temporal similarity for scalable and efficient intelligent video inference","authors":"Chuang Li , Heshi Wang , Yanhua Wen , Qingyu Shi , Qinyu Wang , Chunhua Hu , Dongchen Wu","doi":"10.1016/j.jpdc.2025.105079","DOIUrl":"10.1016/j.jpdc.2025.105079","url":null,"abstract":"<div><div>The integration of video data computation and inference is a cornerstone for the evolution of multimodal artificial intelligence (MAI). The extensive adoption and optimization of CNN-based frameworks have significantly improved the accuracy of video inference, yet they present substantial challenges for real-time and large-scale computational demands. Existing researches primarily utilize the temporal similarity between video frames to reduce redundant computations, but most of them overlooked the spatial similarity within the frames themselves. Hence, we propose STVAI, a scalable and efficient method that leverages both spatial and temporal similarities to accelerate video inference. This approach uses a parallel region merging strategy, which maintains inference accuracy and enhances the sparsity of the computation matrix. Moreover, we have optimized the computation of sparse convolutions by utilizing Tensor Cores, which accelerate dense convolution computations based on the sparsity of the tiles. Experimental results demonstrate that STVAI achieves a stable acceleration of 1.25 times faster than cuDNN implementations, with only a 5% decrease in prediction accuracy. STVAI can achieve accelerations up to 1.53x, surpassing that of existing methods. Our method can be directly applied to various CNN architectures for video inference tasks without the need for retraining the model.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105079"},"PeriodicalIF":3.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yifei Pu , Xinfeng Xia , Xiaofeng Hou , Chi Wang , Cheng Xu , Jiacheng Liu , Jing Wang , Minyi Guo , Jingling Yuan , Chao Li
{"title":"MMBypass: Towards efficient multi-modal AI computing with adaptive bypass network","authors":"Yifei Pu , Xinfeng Xia , Xiaofeng Hou , Chi Wang , Cheng Xu , Jiacheng Liu , Jing Wang , Minyi Guo , Jingling Yuan , Chao Li","doi":"10.1016/j.jpdc.2025.105078","DOIUrl":"10.1016/j.jpdc.2025.105078","url":null,"abstract":"<div><div>Multi-modal artificial intelligence systems demonstrate superior performance through cross-modal information fusion and processing mechanisms, surpassing conventional unimodal architectures. However, the enhanced computational complexity required for processing heterogeneous data streams in multi-modal frameworks results in elevated inference latency compared to their uni-modal architectures. This limitation significantly constrains deployment feasibility for real-time and large-scale applications. To address this challenge, we present <em>MMBypass</em>, an adaptive and efficient architecture for multi-modal AI acceleration. Our solution implements intelligent layer-skipping mechanisms through adaptive computational complexity analysis of multi-modal tasks, achieving latency reduction while maintaining predictive accuracy and mitigating model overfitting in specialized scenarios. The architecture's innovation lies in two aspects: 1) We design bypasses for each uni-modal network in multi-modal networks to perform adaptive computing. 2) We design a guider to dynamically choose the optimal bypasses. Distinct from existing methods, <em>MMBypass</em> maintains broad applicability without requiring domain-specific prerequisites, and it shows significantly better performance on data samples with different difficulties. Empirical evaluations demonstrate our architecture achieves 44.5% average latency reduction while matching or exceeding baseline accuracy across diverse multi-modal benchmarks.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105078"},"PeriodicalIF":3.4,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meeniga Vijaya Lakshmi , M. Sri Raghavendra , MaddalaVijaya Lakshmi
{"title":"Design of energy-aware sensor networks for climate and pollution monitoring","authors":"Meeniga Vijaya Lakshmi , M. Sri Raghavendra , MaddalaVijaya Lakshmi","doi":"10.1016/j.jpdc.2025.105084","DOIUrl":"10.1016/j.jpdc.2025.105084","url":null,"abstract":"<div><div>The growing concern over climate change and Pollution has driven the development of energy-efficient sensor networks for environmental monitoring. This research proposes an energy-aware sensor network using Spanning Tree-Reinforcement Learning (ST-RL) to optimize data accuracy, minimize energy consumption, and extend the network's lifetime. The proposed method achieves significant performance improvements compared to existing approaches. Experimental results demonstrate that ST-RL enhances network lifetime by 28.57 %, reduces energy consumption by 41.24 %, improves packet delivery ratio by 3.7 %, and reduces transmission delay by 10 % over traditional methods such as EDAL, FT-EEC, and EAEDAR. The data is collected from multiple environmental sensors, processed using spanning tree algorithms for optimized connectivity and refined with reinforcement learning to suppress unnecessary transmissions. The results confirm that the proposed ST-RL technique significantly enhances energy efficiency and network reliability, making it a promising solution for large-scale climate and pollution monitoring applications.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105084"},"PeriodicalIF":3.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143824009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latency and cost-aware consumer group autoscaling in message broker systems","authors":"Diogo Landau , Nishant Saurabh , Xavier Andrade , Jorge G. Barbosa","doi":"10.1016/j.jpdc.2025.105071","DOIUrl":"10.1016/j.jpdc.2025.105071","url":null,"abstract":"<div><div>Message brokers often facilitate communication between data producers and consumers by adding variable-sized messages to ordered distributed queues. Our goal is to determine the number of consumers and consumer partition assignments needed to ensure that the data consumption rate matches the data production rate. We model this problem as a variable item size bin packing problem. As the production rate varies, new consumer–partition assignments are computed, potentially requiring the reallocation of partitions from one consumer to another. During reallocation, data in the queue are not read, leading to increased latency costs. To address this problem, we focus on the multiobjective optimization cost of minimizing the number of consumers and reducing latency. We introduce several heuristic algorithms and compare them to state-of-the-art heuristics. In our experimental setup, the proposed modified worst fit (MWF) heuristic achieves a 48% reduction, with a similar number of consumers, in comparison with the best fit decrease (BFD). In addition, MWF achieves a <span><math><msup><mrow><mn>99</mn></mrow><mrow><mi>t</mi><mi>h</mi></mrow></msup></math></span> percentile latency of 2.24 seconds compared with that of 364.66 with the approach by Kafka using the same number of consumers. Alternatively, to obtain a lower <span><math><msup><mrow><mn>99</mn></mrow><mrow><mi>t</mi><mi>h</mi></mrow></msup></math></span> percentile latency than our approach does, Kafka requires at least 60% more consumers than our method requires.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105071"},"PeriodicalIF":3.4,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing the layout of embedding BCube into grid architectures","authors":"Paul Immanuel, A. Berin Greeni","doi":"10.1016/j.jpdc.2025.105070","DOIUrl":"10.1016/j.jpdc.2025.105070","url":null,"abstract":"<div><div>The storage, processing, and distribution of enormous volumes of data are made possible by data centers, which are vital components of the contemporary computing infrastructure. The BCube network is a type of significant data center network which was developed for modular data centers that are based on shipping containers. Network embedding of data centers into certain topologies offers several benefits, including improved scalability, reduced power consumption, enhanced reliability, and improved overall network performance. Embedding of a guest graph into suitable host graphs have significant applications like: virtualizing the Network-on-Chip layouts, portability of algorithms, and the simulation capabilities of parallel architectures. A crucial key factor that influences the quality of embedding is layout. So far, there have been few results regarding the embedding of graphs into certain data center networks. However, these results are obtained by fixing data center networks as host graphs with linear arrays and cycles as guest graphs. In this work, we investigate the edge isoperimetric features of BCube and embed it into linear arrays and grid structures by considering it as the guest graph. This study is the first that we are aware of, on embedding data center networks for minimum layout.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105070"},"PeriodicalIF":3.4,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143738837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The (t,k)-diagnosability of Cayley graph generated by 2-tree","authors":"Lulu Yang , Shuming Zhou , Eddie Cheng","doi":"10.1016/j.jpdc.2025.105068","DOIUrl":"10.1016/j.jpdc.2025.105068","url":null,"abstract":"<div><div>Multiprocessor systems, which typically use interconnection networks (or graphs) as underlying topologies, are widely utilized for big data analysis in scientific computing due to the advancements in technologies such as cloud computing, IoT, social network. With the dramatic expansion in the scale of multiprocessor systems, the pursuit and optimization of strategies for identifying faulty processors have become crucial to ensuring the normal operation of high-performance computing systems. System-level diagnosis is a process designed to distinguish between faulty processors and fault-free processors in multiprocessor systems. The <span><math><mo>(</mo><mi>t</mi><mo>,</mo><mi>k</mi><mo>)</mo></math></span>-diagnosis, a generalization of sequential diagnosis, proceeds to identify at least <em>k</em> faulty processors and repair them in each iteration under the assumption that there are at most <em>t</em> faulty processors whenever <span><math><mi>t</mi><mo>≥</mo><mi>k</mi></math></span>. We show that Cayley graph generated by 2-tree is <span><math><mo>(</mo><msup><mrow><mn>2</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>3</mn></mrow></msup><mo>,</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn><mo>)</mo></math></span>-diagnosable under the PMC model for <span><math><mi>n</mi><mo>≥</mo><mn>5</mn></math></span> while it is <span><math><mo>(</mo><mfrac><mrow><msup><mrow><mn>2</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>3</mn></mrow></msup><mo>(</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>6</mn><mo>)</mo></mrow><mrow><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn></mrow></mfrac><mo>,</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn><mo>)</mo></math></span>-diagnosable under the MM<sup>⁎</sup> model for <span><math><mi>n</mi><mo>≥</mo><mn>4</mn></math></span>. As an empirical case study, the <span><math><mo>(</mo><mi>t</mi><mo>,</mo><mi>k</mi><mo>)</mo></math></span>-diagnosabilities of the alternating group graph <span><math><mi>A</mi><msub><mrow><mi>G</mi></mrow><mrow><mi>n</mi></mrow></msub></math></span> under the PMC model and the MM* model have been determined.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105068"},"PeriodicalIF":3.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143687634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A knowledge-driven approach to multi-objective IoT task graph scheduling in fog-cloud computing","authors":"Hadi Gholami, Hongyang Sun","doi":"10.1016/j.jpdc.2025.105069","DOIUrl":"10.1016/j.jpdc.2025.105069","url":null,"abstract":"<div><div>Despite the significant growth of Internet of Things (IoT), there are prominent limitations of this emerging technology, such as limited processing power and storage. Along with the expansion of IoT networks, the fog-cloud computing paradigm has been developed to optimize the provision of services to IoT users by offloading computations to the more powerful processing resources. In this paper, with the aim of optimizing multiple objectives of makespan, energy consumption, and cost, we develop a novel automatic three-module algorithm to schedule multiple task graphs offloaded from IoT devices to the fog-cloud environment. Our algorithm combines the Genetic Algorithm (GA) and the Random Forest (RF) classifier, which we call Hybrid GA-RF (HGARF). Each of the three modules has a responsibility and they are repeated sequentially to extract knowledge from the solution space in the form of IF-THEN rules. The first module is responsible for generating solutions for the training set using a GA. Here, we introduce a chromosome encoding method and a crossover operator to create diversity for multiple task graphs. By expressing a concept called bottleneck and two conditions, we also develop a mutation operator to identify and reduce the workload of certain processing centers. The second module aims at generating rules from the solutions of the training set, and to that end employs an RF classifier. Here, in addition to proposing features to construct decision trees, we develop a format for extracting and recording IF-THEN rules. The third module checks the quality of the generated rules and refines them by predicting the processing resources as well as removing less important rules from the rule set. Finally, the developed HGARF algorithm automatically determines its termination condition based on the quality of the provided solutions. Experimental results demonstrate that our method effectively improves the objective functions in large-size task graphs by up to 13.24 % compared to some state-of-the-art methods.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105069"},"PeriodicalIF":3.4,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data quality management in big data: Strategies, tools, and educational implications","authors":"Thu Nguyen , Hong-Tri Nguyen , Tu-Anh Nguyen-Hoang","doi":"10.1016/j.jpdc.2025.105067","DOIUrl":"10.1016/j.jpdc.2025.105067","url":null,"abstract":"<div><div>This study addresses the critical need for effective Big Data Quality Management (BDQM) in education, a field where data quality has profound implications but remains underexplored. The work systematically progresses from requirement analysis and standard development to the deployment of tools for monitoring and enhancing data quality in big data workflows. The study's contributions are substantiated through five research questions that explore the impact of data quality on analytics, the establishment of evaluation standards, centralized management strategies, improvement techniques, and education-specific BDQM adaptations. By addressing these questions, the research advances both theoretical and practical frameworks, equipping stakeholders with the tools to enhance the reliability and efficiency of data-driven educational initiatives. Integrating Artificial Intelligence (AI) and distributed computing, this research introduces a novel multi-stage BDQM framework that emphasizes data quality assessment, centralized governance, and AI-enhanced improvement techniques. This work underscores the transformative potential of robust BDQM systems in supporting informed decision-making and achieving sustainable outcomes in educational projects. The survey findings highlight the potential for automated data management within big data architectures, suggesting that data quality frameworks can be significantly enhanced by leveraging AI and distributed computing. Additionally, the survey emphasizes emerging trends in big data quality management, specifically (i) automated data cleaning and cleansing and (ii) data enrichment and augmentation.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105067"},"PeriodicalIF":3.4,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143621250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}