International Journal of Parallel Programming最新文献

筛选
英文 中文
ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation ControlPULP:用于多核高性能计算处理器的 RISC-V 片上并行功率控制器,具有基于 FPGA 的硬件在环功率和热仿真功能
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2024-02-26 DOI: 10.1007/s10766-024-00761-4
Alessandro Ottaviano, Robert Balas, Giovanni Bambini, Antonio Del Vecchio, Maicol Ciani, Davide Rossi, Luca Benini, Andrea Bartolini
{"title":"ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation","authors":"Alessandro Ottaviano, Robert Balas, Giovanni Bambini, Antonio Del Vecchio, Maicol Ciani, Davide Rossi, Luca Benini, Andrea Bartolini","doi":"10.1007/s10766-024-00761-4","DOIUrl":"https://doi.org/10.1007/s10766-024-00761-4","url":null,"abstract":"<p>High-performance computing (HPC) processors are nowadays integrated cyber-physical systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output (MIMO) optimal power requirements, high-end processors integrate an on-die power controller system (PCS). While traditional PCSs are based on a simple microcontroller (MCU)-class core, more scalable and flexible PCS architectures are required to support advanced MIMO control algorithms for managing the ever-increasing number of cores, power states, and process, voltage, and temperature variability. This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS platform consisting of a single-core MCU with fast interrupt handling coupled with a scalable multi-core programmable cluster accelerator and a specialized DMA engine for the parallel acceleration of real-time power management policies. ControlPULP relies on FreeRTOS to schedule a reactive power control firmware (PCF) application layer. We demonstrate ControlPULP in a power management use-case targeting a next-generation 72-core HPC processor. We first show that the multi-core cluster accelerates the PCF, achieving 4.9x speedup compared to single-core execution, enabling more advanced power management algorithms within the control hyper-period at a shallow area overhead, about 0.1% the area of a modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based, closed-loop emulation framework that leverages the heterogeneous SoCs paradigm, achieving DVFS tracking with a mean deviation within 3% the plant’s thermal design power (TDP) against a software-equivalent model-in-the-loop approach. Finally, we show that the proposed PCF compares favorably with an industry-grade control algorithm under computational-intensive workloads.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"242 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139967725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Methods for ASPmT-Based Design Space Exploration in Evolutionary Product Design 研究基于 ASPmT 的进化产品设计空间探索方法
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2024-02-24 DOI: 10.1007/s10766-024-00763-2
Luise Müller, Philipp Wanko, Christian Haubelt, Torsten Schaub
{"title":"Investigating Methods for ASPmT-Based Design Space Exploration in Evolutionary Product Design","authors":"Luise Müller, Philipp Wanko, Christian Haubelt, Torsten Schaub","doi":"10.1007/s10766-024-00763-2","DOIUrl":"https://doi.org/10.1007/s10766-024-00763-2","url":null,"abstract":"<p>Nowadays, product development is challenged by increasing system complexity and stringent time-to-market. To handle the demanding market requirements, knowledge from prior product generations is used to derive new, but partially similar product versions. The concept of product generation engineering, hence, allows manufacturers to release high-quality products within short development times. Therefore, in this paper, we propose a novel approach to evaluate the similarity of two product implementations based on the concept of the Hamming distance. This allows the usage of similarity information in various heuristics as well as in strategies and thus, to improve the product design process. In a wide set of cases, we investigate the quality and similarity of design points. In the experiments, the use of strategies leads to significantly short searching times, but also tends to be too restrictive in certain cases. Simultaneously, the quality of the solutions found in the heuristic design space exploration has been shown to be as good or better than for the search from scratch and considerably closer solutions as part of the non-dominated solution front have been found.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"114 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks 针对卷积神经网络的硬件感知进化可解释滤波器修剪
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2024-02-22 DOI: 10.1007/s10766-024-00760-5
Christian Heidorn, Muhammad Sabih, Nicolai Meyerhöfer, Christian Schinabeck, Jürgen Teich, Frank Hannig
{"title":"Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks","authors":"Christian Heidorn, Muhammad Sabih, Nicolai Meyerhöfer, Christian Schinabeck, Jürgen Teich, Frank Hannig","doi":"10.1007/s10766-024-00760-5","DOIUrl":"https://doi.org/10.1007/s10766-024-00760-5","url":null,"abstract":"<p>Filter pruning of convolutional neural networks (CNNs) is a common technique to effectively reduce the memory footprint, the number of arithmetic operations, and, consequently, inference time. Recent pruning approaches also consider the targeted device (i.e., graphics processing units) for CNN deployment to reduce the actual inference time. However, simple metrics, such as the <span>(ell ^1)</span>-norm, are used for deciding which filters to prune. In this work, we propose a hardware-aware technique to explore the vast multi-objective design space of possible filter pruning configurations. Our approach incorporates not only the targeted device but also techniques from explainable artificial intelligence for ranking and deciding which filters to prune. For each layer, the number of filters to be pruned is optimized with the objective of minimizing the inference time and the error rate of the CNN. Experimental results show that our approach can speed up inference time by 1.40× and 1.30× for VGG-16 on the CIFAR-10 dataset and ResNet-18 on the ILSVRC-2012 dataset, respectively, compared to the state-of-the-art ABCPruner.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"819 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139956708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Practical Approach for Employing Tensor Train Decomposition in Edge Devices 在边缘设备中采用张量列车分解的实用方法
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2024-02-16 DOI: 10.1007/s10766-024-00762-3
Milad Kokhazadeh, Georgios Keramidas, Vasilios Kelefouras, Iakovos Stamoulis
{"title":"A Practical Approach for Employing Tensor Train Decomposition in Edge Devices","authors":"Milad Kokhazadeh, Georgios Keramidas, Vasilios Kelefouras, Iakovos Stamoulis","doi":"10.1007/s10766-024-00762-3","DOIUrl":"https://doi.org/10.1007/s10766-024-00762-3","url":null,"abstract":"<p>Deep Neural Networks (DNN) have made significant advances in various fields including speech recognition and image processing. Typically, modern DNNs are both compute and memory intensive, therefore their deployment in low-end devices is a challenging task. A well-known technique to address this problem is Low-Rank Factorization (LRF), where a weight tensor is approximated by one or more lower-rank tensors, reducing both the memory size and the number of executed tensor operations. However, the employment of LRF is a multi-parametric optimization process involving a huge design space where different design points represent different solutions trading-off the number of FLOPs, the memory size, and the prediction accuracy of the DNN models. As a result, extracting an efficient solution is a complex and time-consuming process. In this work, a new methodology is presented that formulates the LRF problem as a (FLOPs vs. memory vs. prediction accuracy) Design Space Exploration (DSE) problem. Then, the DSE space is drastically pruned by removing inefficient solutions. Our experimental results prove that the design space can be efficiently pruned, therefore extract only a limited set of solutions with improved accuracy, memory, and FLOPs compared to the original (non-factorized) model. Our methodology has been developed as a stand-alone, parameterized module integrated into T3F library of TensorFlow 2.X.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"54 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139754825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Access Interval Prediction by Partial Matching for Tightly Coupled Memory Systems 通过部分匹配预测紧密耦合内存系统的访问间隔
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2024-02-13 DOI: 10.1007/s10766-024-00764-1
Viktor Razilov, Robert Wittig, Emil Matúš, Gerhard Fettweis
{"title":"Access Interval Prediction by Partial Matching for Tightly Coupled Memory Systems","authors":"Viktor Razilov, Robert Wittig, Emil Matúš, Gerhard Fettweis","doi":"10.1007/s10766-024-00764-1","DOIUrl":"https://doi.org/10.1007/s10766-024-00764-1","url":null,"abstract":"<p>In embedded systems, tightly coupled memories (TCMs) are usually shared between multiple masters for the purpose of hardware efficiency and software flexibility. On the one hand, memory sharing improves area utilization, but on the other hand, this can lead to a performance degradation due to an increase in access conflicts. To mitigate the associated performance penalty, access interval prediction (AIP) has been proposed. In a similar fashion to branch prediction, AIP exploits program flow regularity to predict the cycle of the next memory access. We show that this structural similarity allows for adaption of state-of-the-art branch predictors, such as Prediction by Partial Matching (PPM) and the TAgged GEometric history length (TAGE) branch predictor. Our analysis on memory access traces reveals that PPM predicts 99 percent of memory accesses. As PPM does not lend itself to hardware implementation, we also present the PPM-based TAGE access interval predictor which attains an accuracy of over 97 percent outperforming all previously presented implementable AIP schemes.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"29 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139754867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method 利用高效的伪同步更新方法加速大规模分布式深度学习
4区 计算机科学
International Journal of Parallel Programming Pub Date : 2023-11-13 DOI: 10.1007/s10766-023-00759-4
Yingpeng Wen, Zhilin Qiu, Dongyu Zhang, Dan Huang, Nong Xiao, Liang Lin
{"title":"Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method","authors":"Yingpeng Wen, Zhilin Qiu, Dongyu Zhang, Dan Huang, Nong Xiao, Liang Lin","doi":"10.1007/s10766-023-00759-4","DOIUrl":"https://doi.org/10.1007/s10766-023-00759-4","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"60 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136347152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hybrid Machine Learning Model for Code Optimization 代码优化的混合机器学习模型
4区 计算机科学
International Journal of Parallel Programming Pub Date : 2023-09-22 DOI: 10.1007/s10766-023-00758-5
Yacine Hakimi, Riyadh Baghdadi, Yacine Challal
{"title":"A Hybrid Machine Learning Model for Code Optimization","authors":"Yacine Hakimi, Riyadh Baghdadi, Yacine Challal","doi":"10.1007/s10766-023-00758-5","DOIUrl":"https://doi.org/10.1007/s10766-023-00758-5","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136061710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution 基于gpu的空间数据k近邻查询分区和并发内核执行算法
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2023-07-21 DOI: 10.1007/s10766-023-00755-8
Polychronis Velentzas, M. Vassilakopoulos, A. Corral, C. Antonopoulos
{"title":"GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution","authors":"Polychronis Velentzas, M. Vassilakopoulos, A. Corral, C. Antonopoulos","doi":"10.1007/s10766-023-00755-8","DOIUrl":"https://doi.org/10.1007/s10766-023-00755-8","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"1 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48802782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Calculation of Distributed-Order Fractional Derivative on Tensor Cores-Enabled GPU 张量核GPU上分布阶分数阶导数的计算
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2023-07-10 DOI: 10.1007/s10766-023-00754-9
Vsevolod Bohaienko
{"title":"Calculation of Distributed-Order Fractional Derivative on Tensor Cores-Enabled GPU","authors":"Vsevolod Bohaienko","doi":"10.1007/s10766-023-00754-9","DOIUrl":"https://doi.org/10.1007/s10766-023-00754-9","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"51 1","pages":"256 - 270"},"PeriodicalIF":1.5,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45404292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks 分布式图形处理任务的分区感知性能建模
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2023-05-05 DOI: 10.1007/s10766-023-00753-w
Daniel Presser, Frank Siqueira
{"title":"Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks","authors":"Daniel Presser, Frank Siqueira","doi":"10.1007/s10766-023-00753-w","DOIUrl":"https://doi.org/10.1007/s10766-023-00753-w","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"1 1","pages":"1-25"},"PeriodicalIF":1.5,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43960120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信