2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)最新文献_第2页

Adaptive energy minimization of embedded heterogeneous systems using regression-based learning 基于回归学习的嵌入式异构系统自适应能量最小化

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Pub Date : 2015-12-07 DOI: 10.1109/PATMOS.2015.7347594

Sheng Yang, R. Shafik, G. Merrett, Edward A. Stott, Joshua M. Levine, James J. Davis, B. Al-Hashimi

{"title":"Adaptive energy minimization of embedded heterogeneous systems using regression-based learning","authors":"Sheng Yang, R. Shafik, G. Merrett, Edward A. Stott, Joshua M. Levine, James J. Davis, B. Al-Hashimi","doi":"10.1109/PATMOS.2015.7347594","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347594","url":null,"abstract":"Modern embedded systems consist of heterogeneous computing resources with diverse energy and performance trade-offs. This is because these resources exercise the application tasks differently, generating varying workloads and energy consumption. As a result, minimizing energy consumption in these systems is challenging as continuous adaptation between application task mapping (i.e. allocating tasks among the computing resources) and dynamic voltage/frequency scaling (DVFS) is required. Existing approaches have limitations due to lack of such adaptation with practical validation (Table I). This paper addresses such limitation and proposes a novel adaptive energy minimization approach for embedded heterogeneous systems. Fundamental to this approach is a runtime model, generated through regression-based learning of energy/performance trade-offs between different computing resources in the system. Using this model, an application task is suitably mapped on a computing resource during runtime, ensuring minimum energy consumption for a given application performance requirement. Such mapping is also coupled with a DVFS control to adapt to performance and workload variations. The proposed approach is designed, engineered and validated on a Zynq-ZC702 platform, consisting of CPU, DSP and FPGA cores. Using several image processing applications as case studies, it was demonstrated that our proposed approach can achieve significant energy savings (>70%), when compared to the existing approaches.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"171 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114017906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

A versatile and reliable glitch filter for clocks 一个多功能和可靠的时钟故障滤波器

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Pub Date : 2015-12-07 DOI: 10.1109/PATMOS.2015.7347599

Robert Najvirt, A. Steininger

引用次数: 3

Efficient parallelization of the Discrete Wavelet Transform algorithm using memory-oblivious optimizations 使用记忆无关优化的离散小波变换算法的有效并行化

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Pub Date : 2015-12-07 DOI: 10.1109/PATMOS.2015.7347583

A. Keliris, Vasilis Dimitsas, O. Kremmyda, D. Gizopoulos, M. Maniatakos

{"title":"Efficient parallelization of the Discrete Wavelet Transform algorithm using memory-oblivious optimizations","authors":"A. Keliris, Vasilis Dimitsas, O. Kremmyda, D. Gizopoulos, M. Maniatakos","doi":"10.1109/PATMOS.2015.7347583","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347583","url":null,"abstract":"As the rate of single-thread CPU performance improvement per generation has diminished due to lower transistor-speed scaling and energy related issues, researchers and industry have shifted their interest towards multi-core and many-core architectures for improving performance. Comparisons between optimized applications for parallel architectures have been quantified many times in the literature, but contradictory results have been reported mainly due to biased methods of evaluating and comparing these architectures. In this paper, we present memory-oblivious optimizations of the widely used Discrete Wavelet Transform (DWT), and provide detailed comparisons of the algorithm on Intel and AMD multi-core CPUs, Nvidia many-core GPUs, as well as the Intel's Xeon Phi many-core coprocessor. Our results indicate that, compared to their respective non-optimized single thread implementations, memory-oblivious optimization delivers up to 17.9×-197.2× performance improvement for the various architectures examined. Furthermore, compared to the state-of-the-art, the presented CPU and GPU memory-oblivious implementations are 2.6× and 1.3× faster respectively than the fastest implementations of DWT currently available in the literature. No comparison to the state-of-the-art can be made for the Xeon Phi, as, to the best of our knowledge, this is the first study that optimizes the DWT for this newfangled architecture.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128885572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Constructing stability-based clock gating with hierarchical clustering 构造基于稳定性的分层聚类时钟门控

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Pub Date : 2015-12-07 DOI: 10.1109/PATMOS.2015.7347593

Bao Le, Djordje Maksimovic, D. Sengupta, Erhan Ergin, Ryan Berryhill, A. Veneris

{"title":"Constructing stability-based clock gating with hierarchical clustering","authors":"Bao Le, Djordje Maksimovic, D. Sengupta, Erhan Ergin, Ryan Berryhill, A. Veneris","doi":"10.1109/PATMOS.2015.7347593","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347593","url":null,"abstract":"In modern designs, a complex clock distribution network is employed to distribute the clock signal(s) to all the sequential elements. As the functionality of these sequential elements depends heavily on usage scenarios, it is vital that the clock network is optimized for these scenarios. This paper introduces a clock network power optimization methodology based on design usage patterns and stability based clock gating. Specifically, whenever a register retains its value from the previous cycle, a clock gating implementation shuts off its clock and disables data loading to enable power reduction. We first introduce the notion of a stability pattern and its correlation with clock gating efficiency. Next, we introduce a methodology to identify efficient clock gating implementations. In this framework, a clustering algorithm leveraging stability patterns iteratively computes more effective gating implementations. Each implementation is evaluated further on area overhead and critical path delay. If it satisfies all criteria, it is implemented in the design; otherwise, it is sent back to the clustering algorithm to compute new clock gating implementations. Empirical results show 22.6% reduction in clock network power and 16.0% reduction in total power consumption. This confirms the practicality and robustness of the proposed methodology.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126401840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Inferring custom architectures from OpenCL 从OpenCL推断自定义架构

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Pub Date : 2015-12-07 DOI: 10.1109/PATMOS.2015.7347581

Krzysztof Kepa, Ritesh Soni, P. Athanas

{"title":"Inferring custom architectures from OpenCL","authors":"Krzysztof Kepa, Ritesh Soni, P. Athanas","doi":"10.1109/PATMOS.2015.7347581","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347581","url":null,"abstract":"OpenCL has emerged as the de facto cross-platform standard in the GPU-based HPC computing domain. However, in FPGA-based HPC systems, OpenCL-to-FPGA compilers often yield suboptimal results due to the rigid architecture, limited shared-memory, and non-existent inter-work-item communication pathways implied by the OpenCL model. In this work, a methodology of inferring application-specific OpenCL “work-item” interfaces based on kernel code analysis is explored. A proof-of-concept prototype is implemented using an OpenCL source-to-source translator, which allows automated generation of the FPGA-based hardware accelerators directly from the OpenCL sources. The type and implementation of the inferred interface is tailored to match the data access patterns within the kernel. The inferred interface outperforms limitations of the OpenCL rigid architecture and communication model. The presented approach achieves a ~30x speedup over the generic memory-based approach for a 16 work-items application. A set of OpenCL coding patterns targeting FPGA-based HPC systems is also introduced. This technique is demonstrated on a popular bioinformatics algorithm, yet is applicable to any such algorithm with non-standard inter-cell communications.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132096907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Calculation of worst-case execution time for multicore processors using deterministic execution 使用确定性执行的多核处理器最坏情况执行时间的计算

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Pub Date : 2015-12-07 DOI: 10.1109/PATMOS.2015.7347584

Hamid Mushtaq, Z. Al-Ars, K. Bertels

引用次数: 3

Frequency-domain modeling of ground bounce and substrate noise for synchronous and GALS systems 同步和GALS系统的地面弹跳和衬底噪声的频域建模

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Pub Date : 2015-12-07 DOI: 10.1109/PATMOS.2015.7347597

M. Babić, Xin Fan, M. Krstic

引用次数: 6

An unconventional computing technique for ultra-fast and ultra-low power data mining 一种超高速、超低功耗数据挖掘的非常规计算技术

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Pub Date : 2015-12-07 DOI: 10.1109/PATMOS.2015.7347585

V. Canals, A. Morro, A. Oliver, M. Alomar, J. Rosselló

引用次数: 1

Evaluation and mitigation of aging effects on a digital on-chip voltage and temperature sensor 数字片上电压和温度传感器老化效应的评估和缓解

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Pub Date : 2015-12-07 DOI: 10.1109/PATMOS.2015.7347595

M. Altieri, S. Lesecq, D. Puschini, O. Héron, E. Beigné, J. Rodas

引用次数: 4

Unified Power Format (UPF) methodology in a vendor independent flow 统一电源格式(UPF)方法在供应商独立的流程

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Pub Date : 2015-12-07 DOI: 10.1109/PATMOS.2015.7347591

Emilie Garat, David Coriat, E. Beigné, L. Stefanazzi

引用次数: 7