{"title":"The (t,k)-diagnosability of Cayley graph generated by 2-tree","authors":"Lulu Yang , Shuming Zhou , Eddie Cheng","doi":"10.1016/j.jpdc.2025.105068","DOIUrl":"10.1016/j.jpdc.2025.105068","url":null,"abstract":"<div><div>Multiprocessor systems, which typically use interconnection networks (or graphs) as underlying topologies, are widely utilized for big data analysis in scientific computing due to the advancements in technologies such as cloud computing, IoT, social network. With the dramatic expansion in the scale of multiprocessor systems, the pursuit and optimization of strategies for identifying faulty processors have become crucial to ensuring the normal operation of high-performance computing systems. System-level diagnosis is a process designed to distinguish between faulty processors and fault-free processors in multiprocessor systems. The <span><math><mo>(</mo><mi>t</mi><mo>,</mo><mi>k</mi><mo>)</mo></math></span>-diagnosis, a generalization of sequential diagnosis, proceeds to identify at least <em>k</em> faulty processors and repair them in each iteration under the assumption that there are at most <em>t</em> faulty processors whenever <span><math><mi>t</mi><mo>≥</mo><mi>k</mi></math></span>. We show that Cayley graph generated by 2-tree is <span><math><mo>(</mo><msup><mrow><mn>2</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>3</mn></mrow></msup><mo>,</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn><mo>)</mo></math></span>-diagnosable under the PMC model for <span><math><mi>n</mi><mo>≥</mo><mn>5</mn></math></span> while it is <span><math><mo>(</mo><mfrac><mrow><msup><mrow><mn>2</mn></mrow><mrow><mi>n</mi><mo>−</mo><mn>3</mn></mrow></msup><mo>(</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>6</mn><mo>)</mo></mrow><mrow><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn></mrow></mfrac><mo>,</mo><mn>2</mn><mi>n</mi><mo>−</mo><mn>4</mn><mo>)</mo></math></span>-diagnosable under the MM<sup>⁎</sup> model for <span><math><mi>n</mi><mo>≥</mo><mn>4</mn></math></span>. As an empirical case study, the <span><math><mo>(</mo><mi>t</mi><mo>,</mo><mi>k</mi><mo>)</mo></math></span>-diagnosabilities of the alternating group graph <span><math><mi>A</mi><msub><mrow><mi>G</mi></mrow><mrow><mi>n</mi></mrow></msub></math></span> under the PMC model and the MM* model have been determined.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105068"},"PeriodicalIF":3.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143687634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A knowledge-driven approach to multi-objective IoT task graph scheduling in fog-cloud computing","authors":"Hadi Gholami, Hongyang Sun","doi":"10.1016/j.jpdc.2025.105069","DOIUrl":"10.1016/j.jpdc.2025.105069","url":null,"abstract":"<div><div>Despite the significant growth of Internet of Things (IoT), there are prominent limitations of this emerging technology, such as limited processing power and storage. Along with the expansion of IoT networks, the fog-cloud computing paradigm has been developed to optimize the provision of services to IoT users by offloading computations to the more powerful processing resources. In this paper, with the aim of optimizing multiple objectives of makespan, energy consumption, and cost, we develop a novel automatic three-module algorithm to schedule multiple task graphs offloaded from IoT devices to the fog-cloud environment. Our algorithm combines the Genetic Algorithm (GA) and the Random Forest (RF) classifier, which we call Hybrid GA-RF (HGARF). Each of the three modules has a responsibility and they are repeated sequentially to extract knowledge from the solution space in the form of IF-THEN rules. The first module is responsible for generating solutions for the training set using a GA. Here, we introduce a chromosome encoding method and a crossover operator to create diversity for multiple task graphs. By expressing a concept called bottleneck and two conditions, we also develop a mutation operator to identify and reduce the workload of certain processing centers. The second module aims at generating rules from the solutions of the training set, and to that end employs an RF classifier. Here, in addition to proposing features to construct decision trees, we develop a format for extracting and recording IF-THEN rules. The third module checks the quality of the generated rules and refines them by predicting the processing resources as well as removing less important rules from the rule set. Finally, the developed HGARF algorithm automatically determines its termination condition based on the quality of the provided solutions. Experimental results demonstrate that our method effectively improves the objective functions in large-size task graphs by up to 13.24 % compared to some state-of-the-art methods.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105069"},"PeriodicalIF":3.4,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data quality management in big data: Strategies, tools, and educational implications","authors":"Thu Nguyen , Hong-Tri Nguyen , Tu-Anh Nguyen-Hoang","doi":"10.1016/j.jpdc.2025.105067","DOIUrl":"10.1016/j.jpdc.2025.105067","url":null,"abstract":"<div><div>This study addresses the critical need for effective Big Data Quality Management (BDQM) in education, a field where data quality has profound implications but remains underexplored. The work systematically progresses from requirement analysis and standard development to the deployment of tools for monitoring and enhancing data quality in big data workflows. The study's contributions are substantiated through five research questions that explore the impact of data quality on analytics, the establishment of evaluation standards, centralized management strategies, improvement techniques, and education-specific BDQM adaptations. By addressing these questions, the research advances both theoretical and practical frameworks, equipping stakeholders with the tools to enhance the reliability and efficiency of data-driven educational initiatives. Integrating Artificial Intelligence (AI) and distributed computing, this research introduces a novel multi-stage BDQM framework that emphasizes data quality assessment, centralized governance, and AI-enhanced improvement techniques. This work underscores the transformative potential of robust BDQM systems in supporting informed decision-making and achieving sustainable outcomes in educational projects. The survey findings highlight the potential for automated data management within big data architectures, suggesting that data quality frameworks can be significantly enhanced by leveraging AI and distributed computing. Additionally, the survey emphasizes emerging trends in big data quality management, specifically (i) automated data cleaning and cleansing and (ii) data enrichment and augmentation.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105067"},"PeriodicalIF":3.4,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143621250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alan Araujo , Willian Barreiros Jr. , Jun Kong , Renato Ferreira , George Teodoro
{"title":"IMI-GPU: Inverted multi-index for billion-scale approximate nearest neighbor search with GPUs","authors":"Alan Araujo , Willian Barreiros Jr. , Jun Kong , Renato Ferreira , George Teodoro","doi":"10.1016/j.jpdc.2025.105066","DOIUrl":"10.1016/j.jpdc.2025.105066","url":null,"abstract":"<div><div>Similarity search is utilized in specialized database systems designed to handle multimedia data, often represented by high-dimensional features. In this paper, we focus on speeding up the search process with GPUs. This problem has been previously approached by accelerating the Inverted File with Asymmetric Distance Computation algorithm on GPUs (IVFADC-GPU). However, the most recent algorithm for CPU, Inverted Multi-Index (IMI), was not considered for parallelization, being found too challenging for efficient GPU deployment. Thus, we propose a novel and efficient version of IMI for GPUs called IMI-GPU. We propose a new design of the multi-sequence algorithm of IMI, enabling efficient GPU execution. We compared IMI-GPU with IVFADC-GPU using a billion-scale dataset in which IMI-GPU achieved speedups of about 3.2× and 1.9× at Recall@1 and at Recall@16 respectively. The algorithms have been compared in a variety of scenarios and our novel IMI-GPU has shown to significantly outperform IVFADC on GPUs for the majority of tested cases.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105066"},"PeriodicalIF":3.4,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143550639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00027-9","DOIUrl":"10.1016/S0743-7315(25)00027-9","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105060"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143527269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Gao , Hu Ren , Zhuyin Ren , Ming Liu , Chengpeng Zhao , Guangwen Yang
{"title":"Massive parallel simulation of gas turbine combustion using a fully implicit unstructured solver on the heterogeneous Sunway Taihulight supercomputer","authors":"Fei Gao , Hu Ren , Zhuyin Ren , Ming Liu , Chengpeng Zhao , Guangwen Yang","doi":"10.1016/j.jpdc.2025.105055","DOIUrl":"10.1016/j.jpdc.2025.105055","url":null,"abstract":"<div><div>Massive parallel simulations of a full annular aeroengine combustor chamber have been achieved on the on-chip heterogeneous Sunway Taihulight supercomputer. A billion-size unstructured mesh is generated through grid replication and rotation, accompanied by the development of an efficient geometric matching algorithm to address the conformal interface issue. We developed graph-based and tree-based loop fusion approaches for implicit solving procedure of the momentum equation, it is found that the strategic utilization of data reuse and separation of vector computation significantly enhances the performance on many-core processor. For linear system, a finer-grained parallelization based on sparse matrix-vector multiplication and vector computation is validated. Massive parallel tests utilizing 16 K processes with 1 M cores are successfully conducted to simulate the turbulent non-premixed combustion in an aeroengine combustor with nearly one billion cells. Compared to the pre-optimization version, this fully accelerated code achieves an impressive 5.48 times speedup in overall performance, with a parallel efficiency of up to 59 %.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105055"},"PeriodicalIF":3.4,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed landmark labeling for social networks","authors":"Arda Şener, Hüsnü Yenigün, Kamer Kaya","doi":"10.1016/j.jpdc.2025.105057","DOIUrl":"10.1016/j.jpdc.2025.105057","url":null,"abstract":"<div><div>Distance queries are a fundamental part of many network analysis applications. They can be used to infer the closeness of two users in social networks, the relation between two sites in a web graph, or the importance of the interaction between two proteins or molecules. Being able to answer these queries rapidly has many benefits in the area of network analysis. Pruned Landmark Labeling (<span>Pll</span>) is a technique used to generate an index for a given graph that allows the shortest path queries to be completed in a fraction of the time when compared to a standard breadth-first or a depth-first search-based algorithm. Parallel Shortest-distance Labeling (<span>Psl</span>) reorganizes the steps of <span>Pll</span> for the multithreaded setting and is designed particularly for social networks for which the index sizes can be much larger than what a single server can store. Even for a medium-size, 5 million vertex graph, the index size can be more than 40 GB. This paper proposes a hybrid, shared- and distributed-memory algorithm, DPSL, by partitioning the input graph via a vertex separator. The proposed method improves both the parallel execution time and the maximum memory consumption by distributing both the data and the work across multiple nodes of a cluster. For instance, on a graph with 5M vertices and 150M edges, using 4 nodes, DPSL reduces the execution time and maximum memory consumption by 2.13× and 1.87×, respectively, compared to our improved implementation of <span>Psl</span>.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105057"},"PeriodicalIF":3.4,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143427648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring data science workflows: A practice-oriented approach to teaching processing of massive datasets","authors":"Johannes Schoder , H. Martin Bücker","doi":"10.1016/j.jpdc.2025.105043","DOIUrl":"10.1016/j.jpdc.2025.105043","url":null,"abstract":"<div><div>Massive datasets are typically processed by a sequence of different stages, comprising data acquisition and preparation, data processing, data analysis, result validation, and visualization. In conjunction, these stages form a data science workflow, a key element enabling the solution of data-intensive problems. The complexity and heterogeneity of these stages require a diverse set of techniques and skills. This article discusses a hands-on practice-oriented approach aiming to enable and motivate graduate students to engage with realistic data science workflows. A major goal of the approach is to bridge the gap between academia and industry by integrating programming assignments that implement different data workflows with real-world data. In consecutive assignments, students are exposed to the methodology of solving problems using big data frameworks and are required to implement different data workflows of varying complexity. This practice-oriented approach is well received by students, as confirmed by different surveys.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105043"},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient GPU-accelerated parallel cross-correlation","authors":"Karel Maděra, Adam Šmelko, Martin Kruliš","doi":"10.1016/j.jpdc.2025.105054","DOIUrl":"10.1016/j.jpdc.2025.105054","url":null,"abstract":"<div><div>Cross-correlation is a data analysis method widely employed in various signal processing and similarity-search applications. Our objective is to design a highly optimized GPU-accelerated implementation that will speed up the applications and also improve energy efficiency since GPUs are more efficient than CPUs in data-parallel tasks. There are two rudimentary ways to compute cross-correlation — a definition-based algorithm that tries all possible overlaps and an algorithm based on the Fourier transform, which is much more complex but has better asymptotical time complexity. We have focused mainly on the definition-based approach which is better suited for smaller input data and we have implemented multiple CUDA-enabled algorithms with multiple optimization options. The algorithms were evaluated on various scenarios, including the most typical types of multi-signal correlations, and we provide empirically verified optimal solutions for each of the studied scenarios.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105054"},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DePoL: Assuring training integrity in collaborative learning via decentralized verification","authors":"Zhicheng Xu , Xiaoli Zhang , Xuanyu Yin , Hongbing Cheng","doi":"10.1016/j.jpdc.2025.105056","DOIUrl":"10.1016/j.jpdc.2025.105056","url":null,"abstract":"<div><div>Collaborative learning enables multiple participants to jointly train complex models but is vulnerable to attacks like model poisoning or backdoor attacks. Ensuring training integrity can prevent these threats by blocking any tampered contributions from affecting the model. However, traditional approaches often suffer from single points of bottleneck or failure in decentralized environments. To address these issues, we propose <span>DePoL</span>, a secure, scalable, and efficient decentralized verification framework based on duplicated execution. <span>DePoL</span> leverages blockchain to distribute the verification tasks across multiple participant-formed groups, eliminating single-point bottlenecks. Within each group, redundant verification and a majority-based arbitration prevent single points of failure. To further enhance security, <span>DePoL</span> introduces a <em>two-stage plagiarism-free commitment scheme</em> to prevent untrusted verifiers from exploiting public on-chain data. Additionally, a <em>hybrid verification method</em> employs fuzzy matching to handle unpredictable reproduction errors, while a “slow path” ensures zero false positives for honest trainers. Our theoretical analysis demonstrates <span>DePoL</span>'s security and termination properties. Extensive evaluations show that <span>DePoL</span> has overhead similar to common distributed machine learning algorithms, while outperforming centralized verification schemes in scalability, reducing training latency by up to 46%. Additionally, <span>DePoL</span> effectively handles reproduction errors with 0 false positives.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105056"},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}