International Journal of Parallel Programming最新文献_第2页

ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation ControlPULP：用于多核高性能计算处理器的 RISC-V 片上并行功率控制器，具有基于 FPGA 的硬件在环功率和热仿真功能

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2024-02-26 DOI: 10.1007/s10766-024-00761-4

Alessandro Ottaviano, Robert Balas, Giovanni Bambini, Antonio Del Vecchio, Maicol Ciani, Davide Rossi, Luca Benini, Andrea Bartolini

{"title":"ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation","authors":"Alessandro Ottaviano, Robert Balas, Giovanni Bambini, Antonio Del Vecchio, Maicol Ciani, Davide Rossi, Luca Benini, Andrea Bartolini","doi":"10.1007/s10766-024-00761-4","DOIUrl":"https://doi.org/10.1007/s10766-024-00761-4","url":null,"abstract":"High-performance computing (HPC) processors are nowadays integrated cyber-physical systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output (MIMO) optimal power requirements, high-end processors integrate an on-die power controller system (PCS). While traditional PCSs are based on a simple microcontroller (MCU)-class core, more scalable and flexible PCS architectures are required to support advanced MIMO control algorithms for managing the ever-increasing number of cores, power states, and process, voltage, and temperature variability. This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS platform consisting of a single-core MCU with fast interrupt handling coupled with a scalable multi-core programmable cluster accelerator and a specialized DMA engine for the parallel acceleration of real-time power management policies. ControlPULP relies on FreeRTOS to schedule a reactive power control firmware (PCF) application layer. We demonstrate ControlPULP in a power management use-case targeting a next-generation 72-core HPC processor. We first show that the multi-core cluster accelerates the PCF, achieving 4.9x speedup compared to single-core execution, enabling more advanced power management algorithms within the control hyper-period at a shallow area overhead, about 0.1% the area of a modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based, closed-loop emulation framework that leverages the heterogeneous SoCs paradigm, achieving DVFS tracking with a mean deviation within 3% the plant’s thermal design power (TDP) against a software-equivalent model-in-the-loop approach. Finally, we show that the proposed PCF compares favorably with an industry-grade control algorithm under computational-intensive workloads.","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"242 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139967725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating Methods for ASPmT-Based Design Space Exploration in Evolutionary Product Design 研究基于 ASPmT 的进化产品设计空间探索方法

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2024-02-24 DOI: 10.1007/s10766-024-00763-2

Luise Müller, Philipp Wanko, Christian Haubelt, Torsten Schaub

{"title":"Investigating Methods for ASPmT-Based Design Space Exploration in Evolutionary Product Design","authors":"Luise Müller, Philipp Wanko, Christian Haubelt, Torsten Schaub","doi":"10.1007/s10766-024-00763-2","DOIUrl":"https://doi.org/10.1007/s10766-024-00763-2","url":null,"abstract":"Nowadays, product development is challenged by increasing system complexity and stringent time-to-market. To handle the demanding market requirements, knowledge from prior product generations is used to derive new, but partially similar product versions. The concept of product generation engineering, hence, allows manufacturers to release high-quality products within short development times. Therefore, in this paper, we propose a novel approach to evaluate the similarity of two product implementations based on the concept of the Hamming distance. This allows the usage of similarity information in various heuristics as well as in strategies and thus, to improve the product design process. In a wide set of cases, we investigate the quality and similarity of design points. In the experiments, the use of strategies leads to significantly short searching times, but also tends to be too restrictive in certain cases. Simultaneously, the quality of the solutions found in the heuristic design space exploration has been shown to be as good or better than for the search from scratch and considerably closer solutions as part of the non-dominated solution front have been found.","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"114 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks 针对卷积神经网络的硬件感知进化可解释滤波器修剪

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2024-02-22 DOI: 10.1007/s10766-024-00760-5

Christian Heidorn, Muhammad Sabih, Nicolai Meyerhöfer, Christian Schinabeck, Jürgen Teich, Frank Hannig

{"title":"Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks","authors":"Christian Heidorn, Muhammad Sabih, Nicolai Meyerhöfer, Christian Schinabeck, Jürgen Teich, Frank Hannig","doi":"10.1007/s10766-024-00760-5","DOIUrl":"https://doi.org/10.1007/s10766-024-00760-5","url":null,"abstract":"Filter pruning of convolutional neural networks (CNNs) is a common technique to effectively reduce the memory footprint, the number of arithmetic operations, and, consequently, inference time. Recent pruning approaches also consider the targeted device (i.e., graphics processing units) for CNN deployment to reduce the actual inference time. However, simple metrics, such as the (ell ^1)-norm, are used for deciding which filters to prune. In this work, we propose a hardware-aware technique to explore the vast multi-objective design space of possible filter pruning configurations. Our approach incorporates not only the targeted device but also techniques from explainable artificial intelligence for ranking and deciding which filters to prune. For each layer, the number of filters to be pruned is optimized with the objective of minimizing the inference time and the error rate of the CNN. Experimental results show that our approach can speed up inference time by 1.40× and 1.30× for VGG-16 on the CIFAR-10 dataset and ResNet-18 on the ILSVRC-2012 dataset, respectively, compared to the state-of-the-art ABCPruner.","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"819 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139956708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Practical Approach for Employing Tensor Train Decomposition in Edge Devices 在边缘设备中采用张量列车分解的实用方法

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2024-02-16 DOI: 10.1007/s10766-024-00762-3

Milad Kokhazadeh, Georgios Keramidas, Vasilios Kelefouras, Iakovos Stamoulis

{"title":"A Practical Approach for Employing Tensor Train Decomposition in Edge Devices","authors":"Milad Kokhazadeh, Georgios Keramidas, Vasilios Kelefouras, Iakovos Stamoulis","doi":"10.1007/s10766-024-00762-3","DOIUrl":"https://doi.org/10.1007/s10766-024-00762-3","url":null,"abstract":"Deep Neural Networks (DNN) have made significant advances in various fields including speech recognition and image processing. Typically, modern DNNs are both compute and memory intensive, therefore their deployment in low-end devices is a challenging task. A well-known technique to address this problem is Low-Rank Factorization (LRF), where a weight tensor is approximated by one or more lower-rank tensors, reducing both the memory size and the number of executed tensor operations. However, the employment of LRF is a multi-parametric optimization process involving a huge design space where different design points represent different solutions trading-off the number of FLOPs, the memory size, and the prediction accuracy of the DNN models. As a result, extracting an efficient solution is a complex and time-consuming process. In this work, a new methodology is presented that formulates the LRF problem as a (FLOPs vs. memory vs. prediction accuracy) Design Space Exploration (DSE) problem. Then, the DSE space is drastically pruned by removing inefficient solutions. Our experimental results prove that the design space can be efficiently pruned, therefore extract only a limited set of solutions with improved accuracy, memory, and FLOPs compared to the original (non-factorized) model. Our methodology has been developed as a stand-alone, parameterized module integrated into T3F library of TensorFlow 2.X.","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"54 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139754825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Access Interval Prediction by Partial Matching for Tightly Coupled Memory Systems 通过部分匹配预测紧密耦合内存系统的访问间隔

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2024-02-13 DOI: 10.1007/s10766-024-00764-1

Viktor Razilov, Robert Wittig, Emil Matúš, Gerhard Fettweis

引用次数: 0

Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method 利用高效的伪同步更新方法加速大规模分布式深度学习

4区计算机科学

International Journal of Parallel Programming Pub Date : 2023-11-13 DOI: 10.1007/s10766-023-00759-4

Yingpeng Wen, Zhilin Qiu, Dongyu Zhang, Dan Huang, Nong Xiao, Liang Lin

引用次数: 0

A Hybrid Machine Learning Model for Code Optimization 代码优化的混合机器学习模型

4区计算机科学

International Journal of Parallel Programming Pub Date : 2023-09-22 DOI: 10.1007/s10766-023-00758-5

Yacine Hakimi, Riyadh Baghdadi, Yacine Challal

引用次数: 0

GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution 基于gpu的空间数据k近邻查询分区和并发内核执行算法

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2023-07-21 DOI: 10.1007/s10766-023-00755-8

Polychronis Velentzas, M. Vassilakopoulos, A. Corral, C. Antonopoulos

引用次数: 0

Calculation of Distributed-Order Fractional Derivative on Tensor Cores-Enabled GPU 张量核GPU上分布阶分数阶导数的计算

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2023-07-10 DOI: 10.1007/s10766-023-00754-9

Vsevolod Bohaienko

引用次数: 0

Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks 分布式图形处理任务的分区感知性能建模

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2023-05-05 DOI: 10.1007/s10766-023-00753-w

Daniel Presser, Frank Siqueira

引用次数: 0