2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)最新文献

筛选
英文 中文
Adaptive energy minimization of embedded heterogeneous systems using regression-based learning 基于回归学习的嵌入式异构系统自适应能量最小化
Sheng Yang, R. Shafik, G. Merrett, Edward A. Stott, Joshua M. Levine, James J. Davis, B. Al-Hashimi
{"title":"Adaptive energy minimization of embedded heterogeneous systems using regression-based learning","authors":"Sheng Yang, R. Shafik, G. Merrett, Edward A. Stott, Joshua M. Levine, James J. Davis, B. Al-Hashimi","doi":"10.1109/PATMOS.2015.7347594","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347594","url":null,"abstract":"Modern embedded systems consist of heterogeneous computing resources with diverse energy and performance trade-offs. This is because these resources exercise the application tasks differently, generating varying workloads and energy consumption. As a result, minimizing energy consumption in these systems is challenging as continuous adaptation between application task mapping (i.e. allocating tasks among the computing resources) and dynamic voltage/frequency scaling (DVFS) is required. Existing approaches have limitations due to lack of such adaptation with practical validation (Table I). This paper addresses such limitation and proposes a novel adaptive energy minimization approach for embedded heterogeneous systems. Fundamental to this approach is a runtime model, generated through regression-based learning of energy/performance trade-offs between different computing resources in the system. Using this model, an application task is suitably mapped on a computing resource during runtime, ensuring minimum energy consumption for a given application performance requirement. Such mapping is also coupled with a DVFS control to adapt to performance and workload variations. The proposed approach is designed, engineered and validated on a Zynq-ZC702 platform, consisting of CPU, DSP and FPGA cores. Using several image processing applications as case studies, it was demonstrated that our proposed approach can achieve significant energy savings (>70%), when compared to the existing approaches.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"171 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114017906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
A versatile and reliable glitch filter for clocks 一个多功能和可靠的时钟故障滤波器
Robert Najvirt, A. Steininger
{"title":"A versatile and reliable glitch filter for clocks","authors":"Robert Najvirt, A. Steininger","doi":"10.1109/PATMOS.2015.7347599","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347599","url":null,"abstract":"In today's complex system-on-chip architectures the protection of the clock(s) against glitches introduced by environmental disturbances, attackers, or gating measures is becoming increasingly important. Glitch protection is a delicate issue in the digital domain, as it is inherently coupled with metastability issues. The circuit we propose in this paper outputs a clock that strictly follows an input reference clock in the regular case, but guarantees a minimum output pulse width even in case of arbitrary behavior of the reference. We will give a thorough analysis showing that, unlike most existing solutions, our circuit can handle metastability without any residual risk of upsets. Still its implementation is very simple. Our theoretical claims will be supported by simulation results. Furthermore, we will give some examples on possible use cases for such a circuit, like clock gating, clock self-repair, or defense against clock attacks.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133367304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Efficient parallelization of the Discrete Wavelet Transform algorithm using memory-oblivious optimizations 使用记忆无关优化的离散小波变换算法的有效并行化
A. Keliris, Vasilis Dimitsas, O. Kremmyda, D. Gizopoulos, M. Maniatakos
{"title":"Efficient parallelization of the Discrete Wavelet Transform algorithm using memory-oblivious optimizations","authors":"A. Keliris, Vasilis Dimitsas, O. Kremmyda, D. Gizopoulos, M. Maniatakos","doi":"10.1109/PATMOS.2015.7347583","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347583","url":null,"abstract":"As the rate of single-thread CPU performance improvement per generation has diminished due to lower transistor-speed scaling and energy related issues, researchers and industry have shifted their interest towards multi-core and many-core architectures for improving performance. Comparisons between optimized applications for parallel architectures have been quantified many times in the literature, but contradictory results have been reported mainly due to biased methods of evaluating and comparing these architectures. In this paper, we present memory-oblivious optimizations of the widely used Discrete Wavelet Transform (DWT), and provide detailed comparisons of the algorithm on Intel and AMD multi-core CPUs, Nvidia many-core GPUs, as well as the Intel's Xeon Phi many-core coprocessor. Our results indicate that, compared to their respective non-optimized single thread implementations, memory-oblivious optimization delivers up to 17.9×-197.2× performance improvement for the various architectures examined. Furthermore, compared to the state-of-the-art, the presented CPU and GPU memory-oblivious implementations are 2.6× and 1.3× faster respectively than the fastest implementations of DWT currently available in the literature. No comparison to the state-of-the-art can be made for the Xeon Phi, as, to the best of our knowledge, this is the first study that optimizes the DWT for this newfangled architecture.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128885572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Constructing stability-based clock gating with hierarchical clustering 构造基于稳定性的分层聚类时钟门控
Bao Le, Djordje Maksimovic, D. Sengupta, Erhan Ergin, Ryan Berryhill, A. Veneris
{"title":"Constructing stability-based clock gating with hierarchical clustering","authors":"Bao Le, Djordje Maksimovic, D. Sengupta, Erhan Ergin, Ryan Berryhill, A. Veneris","doi":"10.1109/PATMOS.2015.7347593","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347593","url":null,"abstract":"In modern designs, a complex clock distribution network is employed to distribute the clock signal(s) to all the sequential elements. As the functionality of these sequential elements depends heavily on usage scenarios, it is vital that the clock network is optimized for these scenarios. This paper introduces a clock network power optimization methodology based on design usage patterns and stability based clock gating. Specifically, whenever a register retains its value from the previous cycle, a clock gating implementation shuts off its clock and disables data loading to enable power reduction. We first introduce the notion of a stability pattern and its correlation with clock gating efficiency. Next, we introduce a methodology to identify efficient clock gating implementations. In this framework, a clustering algorithm leveraging stability patterns iteratively computes more effective gating implementations. Each implementation is evaluated further on area overhead and critical path delay. If it satisfies all criteria, it is implemented in the design; otherwise, it is sent back to the clustering algorithm to compute new clock gating implementations. Empirical results show 22.6% reduction in clock network power and 16.0% reduction in total power consumption. This confirms the practicality and robustness of the proposed methodology.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126401840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Inferring custom architectures from OpenCL 从OpenCL推断自定义架构
Krzysztof Kepa, Ritesh Soni, P. Athanas
{"title":"Inferring custom architectures from OpenCL","authors":"Krzysztof Kepa, Ritesh Soni, P. Athanas","doi":"10.1109/PATMOS.2015.7347581","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347581","url":null,"abstract":"OpenCL has emerged as the de facto cross-platform standard in the GPU-based HPC computing domain. However, in FPGA-based HPC systems, OpenCL-to-FPGA compilers often yield suboptimal results due to the rigid architecture, limited shared-memory, and non-existent inter-work-item communication pathways implied by the OpenCL model. In this work, a methodology of inferring application-specific OpenCL “work-item” interfaces based on kernel code analysis is explored. A proof-of-concept prototype is implemented using an OpenCL source-to-source translator, which allows automated generation of the FPGA-based hardware accelerators directly from the OpenCL sources. The type and implementation of the inferred interface is tailored to match the data access patterns within the kernel. The inferred interface outperforms limitations of the OpenCL rigid architecture and communication model. The presented approach achieves a ~30x speedup over the generic memory-based approach for a 16 work-items application. A set of OpenCL coding patterns targeting FPGA-based HPC systems is also introduced. This technique is demonstrated on a popular bioinformatics algorithm, yet is applicable to any such algorithm with non-standard inter-cell communications.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132096907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Calculation of worst-case execution time for multicore processors using deterministic execution 使用确定性执行的多核处理器最坏情况执行时间的计算
Hamid Mushtaq, Z. Al-Ars, K. Bertels
{"title":"Calculation of worst-case execution time for multicore processors using deterministic execution","authors":"Hamid Mushtaq, Z. Al-Ars, K. Bertels","doi":"10.1109/PATMOS.2015.7347584","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347584","url":null,"abstract":"Safety critical real time systems need to meet strict timing deadlines. We use a model checking based approach to calculate the WCET, where we apply optimizations to reduce the number of states stored by the model checker. Furthermore, we used deterministic shared memory accesses to further reduce calculation time, memory and number of states needed for calculating WCET. By optimizing the model checking code, we were able to complete benchmarks which otherwise were having state explosion problems. Furthermore, by using deterministic execution, we significantly reduced the calculation time (up to 158×), memory (up to 89×) and states needed (up to 188×) for calculating WCET with a negligible increase (up to 4%) in the calculated WCET for a multicore system with 4 cores. Lastly, unlike other state-of-the-art approaches, that perform binary search to search the WCET by running several iterations, our method calculates WCET in just one iteration. Taking all these optimizations into consideration, the gain in speed was from 1775× to 2471× for 4 threads.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126661343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Frequency-domain modeling of ground bounce and substrate noise for synchronous and GALS systems 同步和GALS系统的地面弹跳和衬底噪声的频域建模
M. Babić, Xin Fan, M. Krstic
{"title":"Frequency-domain modeling of ground bounce and substrate noise for synchronous and GALS systems","authors":"M. Babić, Xin Fan, M. Krstic","doi":"10.1109/PATMOS.2015.7347597","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347597","url":null,"abstract":"In this work, the ground bounce noise has been modeled and analyzed in frequency domain, for both synchronous and GALS (globally asynchronous, locally synchronous) systems. The analysis has been performed analytically, and validated by numerical simulations in MATLAB. Package parasitics and power distribution network have been coarsely modeled by a simple lumped model, while switching currents have been modeled as periodic triangular pulses. Dominant components of spectrum are determined, and the impact of their distribution on the requirements for substrate modeling has been discussed. It has been concluded that resistive substrate approximation introduces large errors for systems with small decoupling capacitances, while it can be satisfactory for systems with large decoupling capacitances.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"17 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130920077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An unconventional computing technique for ultra-fast and ultra-low power data mining 一种超高速、超低功耗数据挖掘的非常规计算技术
V. Canals, A. Morro, A. Oliver, M. Alomar, J. Rosselló
{"title":"An unconventional computing technique for ultra-fast and ultra-low power data mining","authors":"V. Canals, A. Morro, A. Oliver, M. Alomar, J. Rosselló","doi":"10.1109/PATMOS.2015.7347585","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347585","url":null,"abstract":"In this work we review the basic principles of stochastic logic and propose its application to probabilistic-based pattern-recognition analysis. The proposed technique is the implementation of a parallel comparison of data with respect to various pre-stored categories. We design smart pulse-based stochastic-logic blocks to provide an efficient pattern recognition analysis. The proposed architecture can speed-up the screening process of huge databases by two orders of magnitude with respect classical software-based solutions, thus implying a great improvement in terms of total performance (speed and power dissipation).","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123066317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluation and mitigation of aging effects on a digital on-chip voltage and temperature sensor 数字片上电压和温度传感器老化效应的评估和缓解
M. Altieri, S. Lesecq, D. Puschini, O. Héron, E. Beigné, J. Rodas
{"title":"Evaluation and mitigation of aging effects on a digital on-chip voltage and temperature sensor","authors":"M. Altieri, S. Lesecq, D. Puschini, O. Héron, E. Beigné, J. Rodas","doi":"10.1109/PATMOS.2015.7347595","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347595","url":null,"abstract":"Power efficiency is a tremendous challenge for high performance embedded systems under energy constraints. Fine grain Dynamic Voltage and Frequency Scaling approaches are usually implemented in order to meet these conflicting objectives. Moreover, these techniques can be improved if local and on-the-fly monitoring of the dynamic variations is performed. A low-cost onchip general purpose sensor associated with an appropriate data fusion technique has been recently developed in order to monitor local temperature and voltage conditions. However, reliability has become a major concern as the technology scales below 40nm. The aging variation is not anymore negligible and must be taken into account during the monitor design and operation. This paper revisits such a sensor under both BTI and HCI aging effects in 28nm STMicroelectronics technology. A simple recalibration method is also proposed to mitigate the aging effects on the VT estimation.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127605468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Unified Power Format (UPF) methodology in a vendor independent flow 统一电源格式(UPF)方法在供应商独立的流程
Emilie Garat, David Coriat, E. Beigné, L. Stefanazzi
{"title":"Unified Power Format (UPF) methodology in a vendor independent flow","authors":"Emilie Garat, David Coriat, E. Beigné, L. Stefanazzi","doi":"10.1109/PATMOS.2015.7347591","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347591","url":null,"abstract":"To provide designers with an efficient low power design flow, several methodologies have been proposed such as the Unified Power Format (UPF). The main issue faced by designers is the non-interoperability of those methods across different Computer Aided Design (CAD) tools. Although the UPF standard was originally created with interoperability in mind, few of its constructs are actually supported by all CAD vendors. In this paper, we aim at providing a UPF 2.0 methodology that is compatible with different tools. The proposed case study is a circuit with three power domains and a cross-vendor UPF specification. This paper demonstrates a full low power design flow, with formal power checking, power aware simulation, synthesis and back-end.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132035696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信