Journal of Parallel and Distributed Computing最新文献

筛选
英文 中文
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页(每期)/特刊扉页(每期)
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-04-25 DOI: 10.1016/S0743-7315(25)00065-6
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00065-6","DOIUrl":"10.1016/S0743-7315(25)00065-6","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105098"},"PeriodicalIF":3.4,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experience with adapting to a software framework for a use-case in computational science 有适应计算科学用例的软件框架的经验
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-04-24 DOI: 10.1016/j.jpdc.2025.105090
V. Venkatesh Shenoi, Nisha Agrawal
{"title":"Experience with adapting to a software framework for a use-case in computational science","authors":"V. Venkatesh Shenoi,&nbsp;Nisha Agrawal","doi":"10.1016/j.jpdc.2025.105090","DOIUrl":"10.1016/j.jpdc.2025.105090","url":null,"abstract":"<div><div>The effective use of HPC infrastructure critically depends on the human resources involved in the maintenance and operation of these systems alongside the domain scientists and scientific programmers who develop scientific applications to leverage these systems. The workforce typically consists of undergraduates/postgraduates in different fields with broad areas of training in scientific computing and some programming skills with aptitude in HPC. However, there is a gap in the university-level curriculum and the skill set required to adapt to the requirements for developing scientific applications. Some efforts are there to fill this gap through workforce training programs to prepare the graduates for HPC jobs in industry/national labs. In this work, we share our experience training the workforce to adapt to AMReX (<span><span>https://amrex-codes.github.io/amrex/docs_html/</span><svg><path></path></svg></span>), a software framework developed under the Exascale computing project for scientific application development. It requires recapitulation of partial differential equations (PDEs), an indispensable mathematical model for describing physical systems across different scientific domains. We discuss our engagement with the intern, the trainees, and the development team in orienting them to scientific computing on the HPC platform, PDE solvers in particular. We highlight some of the features of the AMReX framework that helped the development team to contribute AMReX-based phase field solvers in the MicroSim phase field solver suite as a case study in adapting to the framework. These solvers can target different architectures without modifications due to the abstraction layer that provides immunity to developers for programming on different architectures. This experience can help to evolve a training model to build the HPC workforce.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105090"},"PeriodicalIF":3.4,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143886152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced resource management: A hands-on master course in HPC and cloud computing 高级资源管理:HPC和云计算的实践硕士课程
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-04-23 DOI: 10.1016/j.jpdc.2025.105091
Lucia Pons, Salvador Petit, Julio Sahuquillo
{"title":"Advanced resource management: A hands-on master course in HPC and cloud computing","authors":"Lucia Pons,&nbsp;Salvador Petit,&nbsp;Julio Sahuquillo","doi":"10.1016/j.jpdc.2025.105091","DOIUrl":"10.1016/j.jpdc.2025.105091","url":null,"abstract":"<div><div>Resource management has become a major concern in dealing with performance and fairness in recent computing servers, including a wide variety of shared resources. To achieve high-performing and efficient systems, both hardware and software engineers must be thoroughly trained in effective resource management techniques. This paper introduces the GRE master course (Spanish acronym for Resource Management and Performance Evaluation in Cloud and High-Performance Workloads), which is being offered since Fall 2023. The course is taught by instructors with broad research expertise in resource management and performance evaluation. Subjects covered in this course include workload characterization, state-of-the-art resource management approaches, and performance evaluation tools and methodologies used in production systems. Management techniques are studied both in the context of HPC and cloud computing, where resource efficiency is becoming a primary concern. To enhance the learning experience, the course integrates theoretical concepts with a wide set of hands-on tasks carried out on recent real platforms. A real cloud virtualized environment is mimicked using typical software deployed in production systems such as Proxmox Virtual Environment. Students learn to use tools such as Linux Perf and Intel Vtune Profiler, which are commonly employed by researchers and practitioners to carry out typical tasks like performance bottleneck analysis from a microarchitectural perspective. Overall, the GRE course provides students with a solid foundation and skills in resource management by addressing current hot topics both in the industry and academia. Student satisfaction and learning outcomes prove the success of the GRE course and encourage us to continue in this direction.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105091"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HIP-RRTMG_SW: Accelerating a shortwave radiative transfer scheme under the heterogeneous-compute interface for portability (HIP) framework HIP- rrtmg_sw:在异构计算接口移植性(HIP)框架下加速短波辐射传输方案
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-04-23 DOI: 10.1016/j.jpdc.2025.105094
Zhenzhen Wang , Yuzhu Wang , Fei Li , Jinrong Jiang , Xiaocong Wang
{"title":"HIP-RRTMG_SW: Accelerating a shortwave radiative transfer scheme under the heterogeneous-compute interface for portability (HIP) framework","authors":"Zhenzhen Wang ,&nbsp;Yuzhu Wang ,&nbsp;Fei Li ,&nbsp;Jinrong Jiang ,&nbsp;Xiaocong Wang","doi":"10.1016/j.jpdc.2025.105094","DOIUrl":"10.1016/j.jpdc.2025.105094","url":null,"abstract":"<div><div>With the development of higher-resolution atmospheric circulation models, the amount of calculation increases polynomially with resolution, and the calculation accuracy of physical processes is increasing rapidly. The traditional parallel computing methods based on multi-core CPUs can no longer meet the requirements of high efficiency and real-time computing performance of climate models. In order to improve the computational efficiency and scalability of the Atmospheric General Circulation Model, it is urgent to study efficient parallel algorithms and performance optimization methods for radiation physical process with massive calculations. In this paper, a heterogeneous multidimensional acceleration algorithm is proposed for the shortwave radiation transfer model (RRTMG_SW) based on HIP. Then, the HIP version of RRTMG_SW is developed, namely HIP-RRTMG_SW. In addition, combined with the “MPI + HIP” hybrid programming model, a multi-GPU implementation of RRTMG_SW is also proposed, and it makes full use of the multi-node, multi-core CPU and multi-GPU computing capability of a heterogeneous high performance computing system. Experimental results show that HIP-RRTMG_SW achieves 7.05× of acceleration in the climate simulation with 0.25<sup>∘</sup> resolution using 16 AMD GPUs on the ORISE supercomputer compared with RRTMG_SW using 128 CPU cores. When using 1024 AMD GPUs, HIP-RRTMG_SW is 83.94× faster than RRTMG_SW with 128 CPU cores, indicating that the proposed multi-GPU acceleration algorithm has strong scalability.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105094"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editor's note 编者按
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-04-18 DOI: 10.1016/j.jpdc.2025.105089
Ananth Kalyanaraman
{"title":"Editor's note","authors":"Ananth Kalyanaraman","doi":"10.1016/j.jpdc.2025.105089","DOIUrl":"10.1016/j.jpdc.2025.105089","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105089"},"PeriodicalIF":3.4,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient parameter tuning for a structure-based virtual screening HPC application 基于结构的虚拟筛选HPC应用程序的有效参数调整
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-04-15 DOI: 10.1016/j.jpdc.2025.105087
Bruno Guindani, Davide Gadioli, Roberto Rocco, Danilo Ardagna, Gianluca Palermo
{"title":"Efficient parameter tuning for a structure-based virtual screening HPC application","authors":"Bruno Guindani,&nbsp;Davide Gadioli,&nbsp;Roberto Rocco,&nbsp;Danilo Ardagna,&nbsp;Gianluca Palermo","doi":"10.1016/j.jpdc.2025.105087","DOIUrl":"10.1016/j.jpdc.2025.105087","url":null,"abstract":"<div><div>Virtual screening applications are highly parameterized to optimize the balance between quality and execution performance. While output quality is critical, the entire screening process must be completed within a reasonable time. In fact, a slight reduction in output accuracy may be acceptable when dealing with large datasets. Finding the optimal quality-throughput trade-off depends on the specific HPC system used and should be re-evaluated with each new deployment or significant code update. This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments. These techniques extend sequential Bayesian Optimization (BO) with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints. Our target application is LiGen, a real-world virtual screening software for drug discovery. The proposed methods address two relevant challenges: efficient exploration of the parameter space and performance measurement using domain-specific metrics and procedures. We conduct an experimental campaign comparing the two methods with a popular state-of-the-art autotuner. Results show that our methods find configurations that are, on average, up to 35–42% better than the ones found by the autotuner and the default expert-picked LiGen configuration.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105087"},"PeriodicalIF":3.4,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Schedule multi-instance microservices to minimize response time under budget constraint in cloud HPC systems 在云高性能计算系统中,调度多实例微服务以在预算限制下最小化响应时间
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-04-08 DOI: 10.1016/j.jpdc.2025.105086
Dong Wang , Hong Shen , Hui Tian , Yuanhao Yang
{"title":"Schedule multi-instance microservices to minimize response time under budget constraint in cloud HPC systems","authors":"Dong Wang ,&nbsp;Hong Shen ,&nbsp;Hui Tian ,&nbsp;Yuanhao Yang","doi":"10.1016/j.jpdc.2025.105086","DOIUrl":"10.1016/j.jpdc.2025.105086","url":null,"abstract":"<div><div>In the emerging microservice-based architecture of cloud HPC systems, a challenging problem of critical importance for system service capability is how we can schedule microservices to minimize the end-to-end response time for user requests while keeping cost within the specified budget. We address this problem for multi-instance microservices requested by a single application to which no existing result is known to our knowledge. We propose an effective two-stage solution of first allocating budget (resources) to microservices within the budget constraint and then deploying microservice instances on servers to minimize system operational overhead. For budget allocation, we formulate it as the Discrete Time Cost Tradeoff (DTCT) problem which is NP-hard, present a linear program (LP) based algorithm, and provide a rigorous proof of its worst-case performance guarantee of 4 from the optimal solution. For microservice deployment, we show that it is harder than the NP-hard problem of 1-D binpacking through establishing its mathematical model, and propose a heuristic algorithm of Least First Mapping that greedily places microservice instances on fewest possible servers to minimize system operation cost. The experiment results of extensive simulations on DAG-based applications of different sizes demonstrate the superior performance of our algorithm in comparison with the existing approaches.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"202 ","pages":"Article 105086"},"PeriodicalIF":3.4,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页(每期)/特刊扉页(每期)
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-04-06 DOI: 10.1016/S0743-7315(25)00041-3
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00041-3","DOIUrl":"10.1016/S0743-7315(25)00041-3","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"200 ","pages":"Article 105074"},"PeriodicalIF":3.4,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143785399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep embedded lightweight CNN network for indoor objects detection on FPGA 基于FPGA的室内物体检测的深度嵌入式轻量级CNN网络
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-04-05 DOI: 10.1016/j.jpdc.2025.105085
Mouna Afif , Riadh Ayachi , Yahia Said , Mohamed Atri
{"title":"Deep embedded lightweight CNN network for indoor objects detection on FPGA","authors":"Mouna Afif ,&nbsp;Riadh Ayachi ,&nbsp;Yahia Said ,&nbsp;Mohamed Atri","doi":"10.1016/j.jpdc.2025.105085","DOIUrl":"10.1016/j.jpdc.2025.105085","url":null,"abstract":"<div><div>Indoor object detection and recognition present an active research axis in computer vision and artificial intelligence fields. Various deep learning-based techniques can be applied to solve object detection problems. With the appearance of deep convolutional neural networks (DCNN) a great breakthrough for various applications was achieved. Indoor object detection presents a primary task that can assist Blind and Visually Impaired persons (BVI) during their navigation. However, building a reliable indoor object detection system used for edge device implementations still presents a serious challenge. To address this problem, we propose in this work to build an indoor object detection system based on DCNN network. Cross-stage partial network (CSPNet) was used for the detection process and a lightweight backbone based on EfficientNet v2 was used as a network backbone. To ensure a lightweight implementation of the proposed work on FPGA devices, various optimization techniques have been applied to compress the model size and reduce its computation complexity. The proposed indoor object detection system was implemented on a Xilinx ZCU 102 board. Training and testing experiments have been conducted on the proposed indoor objects dataset that counts 11,000 images containing 25 landmark classes and in indoor objects detection dataset. The proposed work achieved 82.60 mAP and 28 FPS for the original version and 80.04 with 35 FPS as processing speed for the compressed version.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105085"},"PeriodicalIF":3.4,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143806913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Price-aware resource management for multi-modal DNN inference in collaborative heterogeneous edge environments 协同异构边缘环境下多模态DNN推理的价格感知资源管理
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-04-04 DOI: 10.1016/j.jpdc.2025.105080
Fengyi Huang , Wenhua Wang , Jianxiong Guo , Wentao Fan , Yang Xu , Tian Wang , Jiannong Cao
{"title":"Price-aware resource management for multi-modal DNN inference in collaborative heterogeneous edge environments","authors":"Fengyi Huang ,&nbsp;Wenhua Wang ,&nbsp;Jianxiong Guo ,&nbsp;Wentao Fan ,&nbsp;Yang Xu ,&nbsp;Tian Wang ,&nbsp;Jiannong Cao","doi":"10.1016/j.jpdc.2025.105080","DOIUrl":"10.1016/j.jpdc.2025.105080","url":null,"abstract":"<div><div>To address the limitations of ARM64-based AI edge devices, which are energy-efficient but computationally constrained, as well as general-purpose edge servers, this paper proposes a multi-modal CollaborativeHeterogeneous Edge Computing (CHEC) architecture that achieves low latency and enhances computational capabilities. The CHEC framework, which is segmented into an edge private cloud and an edge public cloud, endeavors to optimize the profits of Edge Service Providers (ESPs) through dynamic heterogeneous resource management. In particular, it is achieved by formulating the challenge as a multi-stage Mixed-Integer Nonlinear Programming (MINLP) problem. We introduce a resource collaboration system based on resource leasing incorporating three Economic Payment Models (EPMs), ensuring efficient and profitable resource utilization. To tackle this complex issue, we develop a three-layer Hybrid Deep Reinforcement Learning (HDRL) algorithm with EPMs, HDRL-EPMs, for efficient management of dynamic and heterogeneous resources. Extensive simulations confirm the algorithm's ability to ensure convergence and approximate optimal solutions, significantly outperforming existing methods. Testbed experiments demonstrate that the CHEC architecture reduces latency by up to 21.83% in real-world applications, markedly surpassing previous approaches.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"201 ","pages":"Article 105080"},"PeriodicalIF":3.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信