{"title":"Teaching parallel and distributed computing in a single undergraduate-level course","authors":"Tia Newhall","doi":"10.1016/j.jpdc.2025.105092","DOIUrl":"10.1016/j.jpdc.2025.105092","url":null,"abstract":"<div><div>As the application of parallel distributed computing (PDC) becomes ever more pervasive, it is increasingly important that undergraduate CS curricula expose students to a wide range of PDC topics in order to prepare them for the workforce. We present the curricular design and learning goals of an upper-level undergraduate course that covers a wide breadth of topics in parallel and distributed computing, while also providing students with depth of experience and development of problem solving, programming, and analysis skills. We discuss lessons learned from our experiences teaching this course over 15 years, and we discuss changes and improvements we have made in its offerings, as well as choices and trade-offs we made to achieve a balance between breadth and depth of coverage across these two huge fields. Evaluations from students support that our approach works well meeting the goals of exposing students to a broad range of PDC topics, building important PDC thinking and programming skills, and meeting other pedagogical goals of an advanced upper-level undergraduate CS course. Although initially designed as a single course due to constraints that are common to smaller schools, our experiences with this course lead us to conclude that it is a good approach for an advanced undergraduate course on PDC at any institution.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105092"},"PeriodicalIF":3.4,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143912437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00065-6","DOIUrl":"10.1016/S0743-7315(25)00065-6","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105098"},"PeriodicalIF":3.4,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experience with adapting to a software framework for a use-case in computational science","authors":"V. Venkatesh Shenoi, Nisha Agrawal","doi":"10.1016/j.jpdc.2025.105090","DOIUrl":"10.1016/j.jpdc.2025.105090","url":null,"abstract":"<div><div>The effective use of HPC infrastructure critically depends on the human resources involved in the maintenance and operation of these systems alongside the domain scientists and scientific programmers who develop scientific applications to leverage these systems. The workforce typically consists of undergraduates/postgraduates in different fields with broad areas of training in scientific computing and some programming skills with aptitude in HPC. However, there is a gap in the university-level curriculum and the skill set required to adapt to the requirements for developing scientific applications. Some efforts are there to fill this gap through workforce training programs to prepare the graduates for HPC jobs in industry/national labs. In this work, we share our experience training the workforce to adapt to AMReX (<span><span>https://amrex-codes.github.io/amrex/docs_html/</span><svg><path></path></svg></span>), a software framework developed under the Exascale computing project for scientific application development. It requires recapitulation of partial differential equations (PDEs), an indispensable mathematical model for describing physical systems across different scientific domains. We discuss our engagement with the intern, the trainees, and the development team in orienting them to scientific computing on the HPC platform, PDE solvers in particular. We highlight some of the features of the AMReX framework that helped the development team to contribute AMReX-based phase field solvers in the MicroSim phase field solver suite as a case study in adapting to the framework. These solvers can target different architectures without modifications due to the abstraction layer that provides immunity to developers for programming on different architectures. This experience can help to evolve a training model to build the HPC workforce.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105090"},"PeriodicalIF":3.4,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143886152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2-edge-Hamilton-connected dragonfly network","authors":"Huimei Guo , Rong-Xia Hao , Jie Wu","doi":"10.1016/j.jpdc.2025.105095","DOIUrl":"10.1016/j.jpdc.2025.105095","url":null,"abstract":"<div><div>The dragonfly networks are being used in the supercomputers of today. It is of interest to study the topological properties of dragonfly networks. Let <span><math><mi>G</mi><mo>=</mo><mo>(</mo><mi>V</mi><mo>(</mo><mi>G</mi><mo>)</mo><mo>,</mo><mi>E</mi><mo>(</mo><mi>G</mi><mo>)</mo><mo>)</mo></math></span> be a graph. Let <em>X</em> be a subset of <span><math><mo>{</mo><mi>u</mi><mi>v</mi><mo>:</mo><mi>u</mi><mo>,</mo><mi>v</mi><mo>∈</mo><mi>V</mi><mo>(</mo><mi>G</mi><mo>)</mo><mspace></mspace><mtext>and</mtext><mspace></mspace><mi>u</mi><mo>≠</mo><mi>v</mi><mo>}</mo></math></span> such that every component induced by <em>X</em> on <span><math><mi>V</mi><mo>(</mo><mi>G</mi><mo>)</mo></math></span> is a path. If, <span><math><mo>|</mo><mi>X</mi><mo>|</mo><mo>≤</mo><mi>k</mi></math></span> and after adding all edges in <em>X</em> to <em>G</em>, the resulting graph contains a Hamiltonian cycle that includes all edges in <em>X</em>, then the graph <em>G</em> is called <em>k</em>-edge-Hamilton-connected. This property can be used to design and optimize routing and forwarding algorithms. By finding such Hamiltonian cycle containing specific edges in the network, it can be ensured that every node can act as an intermediate node to forward packets through a specific channel, thus enabling efficient data transmission and routing. For <span><math><mi>k</mi><mo>=</mo><mn>2</mn></math></span>, determining whether a graph is <em>k</em>-edge-Hamilton-connected is a challenging problem, as it is known to be NP-complete. 2-edge-Hamilton-connected is an extension of Hamilton-connected. In this paper, we prove that the relative arrangement dragonfly network, a type of dragonfly network constructed by the global connections based on relative arrangements, is 2-edge-Hamilton-connected, and this property shows that dragonfly networks have strong reliability. In addition, we determined that <span><math><mi>D</mi><mo>(</mo><mi>n</mi><mo>,</mo><mi>h</mi><mo>,</mo><mi>g</mi><mo>)</mo></math></span> is 1-Hamilton-connected and paired 2-disjoint path coverable with <span><math><mi>n</mi><mo>≥</mo><mn>4</mn></math></span> and <span><math><mi>h</mi><mo>≥</mo><mn>2</mn></math></span>.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105095"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advanced resource management: A hands-on master course in HPC and cloud computing","authors":"Lucia Pons, Salvador Petit, Julio Sahuquillo","doi":"10.1016/j.jpdc.2025.105091","DOIUrl":"10.1016/j.jpdc.2025.105091","url":null,"abstract":"<div><div>Resource management has become a major concern in dealing with performance and fairness in recent computing servers, including a wide variety of shared resources. To achieve high-performing and efficient systems, both hardware and software engineers must be thoroughly trained in effective resource management techniques. This paper introduces the GRE master course (Spanish acronym for Resource Management and Performance Evaluation in Cloud and High-Performance Workloads), which is being offered since Fall 2023. The course is taught by instructors with broad research expertise in resource management and performance evaluation. Subjects covered in this course include workload characterization, state-of-the-art resource management approaches, and performance evaluation tools and methodologies used in production systems. Management techniques are studied both in the context of HPC and cloud computing, where resource efficiency is becoming a primary concern. To enhance the learning experience, the course integrates theoretical concepts with a wide set of hands-on tasks carried out on recent real platforms. A real cloud virtualized environment is mimicked using typical software deployed in production systems such as Proxmox Virtual Environment. Students learn to use tools such as Linux Perf and Intel Vtune Profiler, which are commonly employed by researchers and practitioners to carry out typical tasks like performance bottleneck analysis from a microarchitectural perspective. Overall, the GRE course provides students with a solid foundation and skills in resource management by addressing current hot topics both in the industry and academia. Student satisfaction and learning outcomes prove the success of the GRE course and encourage us to continue in this direction.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105091"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenzhen Wang , Yuzhu Wang , Fei Li , Jinrong Jiang , Xiaocong Wang
{"title":"HIP-RRTMG_SW: Accelerating a shortwave radiative transfer scheme under the heterogeneous-compute interface for portability (HIP) framework","authors":"Zhenzhen Wang , Yuzhu Wang , Fei Li , Jinrong Jiang , Xiaocong Wang","doi":"10.1016/j.jpdc.2025.105094","DOIUrl":"10.1016/j.jpdc.2025.105094","url":null,"abstract":"<div><div>With the development of higher-resolution atmospheric circulation models, the amount of calculation increases polynomially with resolution, and the calculation accuracy of physical processes is increasing rapidly. The traditional parallel computing methods based on multi-core CPUs can no longer meet the requirements of high efficiency and real-time computing performance of climate models. In order to improve the computational efficiency and scalability of the Atmospheric General Circulation Model, it is urgent to study efficient parallel algorithms and performance optimization methods for radiation physical process with massive calculations. In this paper, a heterogeneous multidimensional acceleration algorithm is proposed for the shortwave radiation transfer model (RRTMG_SW) based on HIP. Then, the HIP version of RRTMG_SW is developed, namely HIP-RRTMG_SW. In addition, combined with the “MPI + HIP” hybrid programming model, a multi-GPU implementation of RRTMG_SW is also proposed, and it makes full use of the multi-node, multi-core CPU and multi-GPU computing capability of a heterogeneous high performance computing system. Experimental results show that HIP-RRTMG_SW achieves 7.05× of acceleration in the climate simulation with 0.25<sup>∘</sup> resolution using 16 AMD GPUs on the ORISE supercomputer compared with RRTMG_SW using 128 CPU cores. When using 1024 AMD GPUs, HIP-RRTMG_SW is 83.94× faster than RRTMG_SW with 128 CPU cores, indicating that the proposed multi-GPU acceleration algorithm has strong scalability.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105094"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bruno Guindani, Davide Gadioli, Roberto Rocco, Danilo Ardagna, Gianluca Palermo
{"title":"Efficient parameter tuning for a structure-based virtual screening HPC application","authors":"Bruno Guindani, Davide Gadioli, Roberto Rocco, Danilo Ardagna, Gianluca Palermo","doi":"10.1016/j.jpdc.2025.105087","DOIUrl":"10.1016/j.jpdc.2025.105087","url":null,"abstract":"<div><div>Virtual screening applications are highly parameterized to optimize the balance between quality and execution performance. While output quality is critical, the entire screening process must be completed within a reasonable time. In fact, a slight reduction in output accuracy may be acceptable when dealing with large datasets. Finding the optimal quality-throughput trade-off depends on the specific HPC system used and should be re-evaluated with each new deployment or significant code update. This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments. These techniques extend sequential Bayesian Optimization (BO) with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints. Our target application is LiGen, a real-world virtual screening software for drug discovery. The proposed methods address two relevant challenges: efficient exploration of the parameter space and performance measurement using domain-specific metrics and procedures. We conduct an experimental campaign comparing the two methods with a popular state-of-the-art autotuner. Results show that our methods find configurations that are, on average, up to 35–42% better than the ones found by the autotuner and the default expert-picked LiGen configuration.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105087"},"PeriodicalIF":3.4,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Schedule multi-instance microservices to minimize response time under budget constraint in cloud HPC systems","authors":"Dong Wang , Hong Shen , Hui Tian , Yuanhao Yang","doi":"10.1016/j.jpdc.2025.105086","DOIUrl":"10.1016/j.jpdc.2025.105086","url":null,"abstract":"<div><div>In the emerging microservice-based architecture of cloud HPC systems, a challenging problem of critical importance for system service capability is how we can schedule microservices to minimize the end-to-end response time for user requests while keeping cost within the specified budget. We address this problem for multi-instance microservices requested by a single application to which no existing result is known to our knowledge. We propose an effective two-stage solution of first allocating budget (resources) to microservices within the budget constraint and then deploying microservice instances on servers to minimize system operational overhead. For budget allocation, we formulate it as the Discrete Time Cost Tradeoff (DTCT) problem which is NP-hard, present a linear program (LP) based algorithm, and provide a rigorous proof of its worst-case performance guarantee of 4 from the optimal solution. For microservice deployment, we show that it is harder than the NP-hard problem of 1-D binpacking through establishing its mathematical model, and propose a heuristic algorithm of Least First Mapping that greedily places microservice instances on fewest possible servers to minimize system operation cost. The experiment results of extensive simulations on DAG-based applications of different sizes demonstrate the superior performance of our algorithm in comparison with the existing approaches.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105086"},"PeriodicalIF":3.4,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00041-3","DOIUrl":"10.1016/S0743-7315(25)00041-3","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105074"},"PeriodicalIF":3.4,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143785399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}