2015 33rd IEEE International Conference on Computer Design (ICCD)最新文献

An automated design flow for approximate circuits based on reduced precision redundancy 基于降低精度冗余的近似电路自动设计流程

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357088

D. J. Pagliari, A. Calimera, E. Macii, M. Poncino

{"title":"An automated design flow for approximate circuits based on reduced precision redundancy","authors":"D. J. Pagliari, A. Calimera, E. Macii, M. Poncino","doi":"10.1109/ICCD.2015.7357088","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357088","url":null,"abstract":"Reduced Precision Redundancy (RPR) is a popular Approximate Computing technique, in which a circuit operated in Voltage Over-Scaling (VOS) is paired to a reduced-bitwidth and faster replica so that VOS-induced timing errors are partially recovered by the replica, and their impact is mitigated. Previous works have provided various examples of effective implementations of RPR, which however suffer from three limitations: first, these circuits are designed using ad-hoc procedures, and no generalization is provided; second, error impact analysis is carried out statistically, thus neglecting issues like non-elementary data distribution and temporal correlation. Last, only dynamic power was considered in the optimization. In this work we propose a new generalized approach to RPR that allows to overcome all these limitations, leveraging the capabilities of state-of-the-art synthesis and simulation tools. By sacrificing theoretical provability in favor of an empirical input-based analysis, we build a design tool able to automatically add RPR to a preexisting gate-level netlist. Thanks to this method, we are able to confute some of the conclusions drawn in previous works, in particular those related to statistical assumptions on inputs; we show that a given inputs distribution may yield extremely different results depending on their temporal behavior.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115011907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Data-driven logic synthesizer for acceleration of Forward propagation in artificial neural networks 加速人工神经网络前向传播的数据驱动逻辑合成器

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357142

K. Mahmoud, W. E. Smith, Mark Fishkin, Timothy N. Miller

引用次数: 0

Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms 在异构移动平台上高效执行数据并行应用程序

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357105

Alok Prakash, Siqi Wang, Alexandru Eugen Irimiea, T. Mitra

{"title":"Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms","authors":"Alok Prakash, Siqi Wang, Alexandru Eugen Irimiea, T. Mitra","doi":"10.1109/ICCD.2015.7357105","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357105","url":null,"abstract":"State-of-the-art mobile system-on-chips (SoC) include heterogeneity in various forms for accelerated and energy-efficient execution of diverse range of applications. The modern SoCs now include programmable cores such as CPU and GPU with very different functionality. The SoCs also integrate performance heterogeneous cores with different power-performance characteristics but the same instruction-set architecture such as ARM big.LITTLE. In this paper, we first explore and establish the combined benefits of functional heterogeneity and performance heterogeneity in improving power-performance behavior of data parallel applications. Next, given an application specified in OpenCL, we present a static partitioning strategy to execute the application kernel across CPU and GPU cores along with voltage-frequency setting for individual cores so as to obtain the best power-performance tradeoff. We achieve over 19% runtime improvement by exploiting the functional and performance heterogeneities concurrently. In addition, energy saving of 36% is achieved by using appropriate voltage-frequency setting without significantly degrading the runtime improvement from concurrent execution.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126630726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

A pre-search assisted ILP approach to analog integrated circuit routing 模拟集成电路布线的预搜索辅助ILP方法

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357110

Chi-Yu Wu, H. Graeb, Jiang Hu

引用次数: 9

Online mechanism for reliability and power-efficiency management of a dynamically reconfigurable core 动态可重构核心的可靠性和能效在线管理机制

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357121

S. Srinivasan, I. Koren, S. Kundu

{"title":"Online mechanism for reliability and power-efficiency management of a dynamically reconfigurable core","authors":"S. Srinivasan, I. Koren, S. Kundu","doi":"10.1109/ICCD.2015.7357121","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357121","url":null,"abstract":"Previous studies have shown that the best way to achieve high throughput/Watt of a single threaded application is by running it on an asymmetric multicore processor (AMP). AMPs feature cores that are tuned for specific workload characteristics. To increase efficiency, the core that offers the best power-performance trade-off for the executing thread is chosen. To reduce the overhead of thread migration, we have previously proposed a morphable core that can morph into multiple core types. In this study, apart from power-performance efficiency, we also consider the reliability of the different core types as indicated by their vulnerability to soft-errors. We show that the best core type for power-efficiency may not be the best for reliability. Accordingly, we develop a multi-objective thread migration strategy to determine the best core type considering power efficiency and reliability. To support runtime decision making, we have developed online estimators for reliability and power efficiency based on performance monitoring counters. In keeping with the existing literature, we use the architectural vulnerability factor (AVF) as the metric for reliability and instructions-per-second2/Watt as the metric for power efficiency. For the multi-objective optimization we use a Cobb-Douglas production function. Our results indicate that the proposed runtime mechanism for reliability and power-efficiency improves, on the average, the throughput/Watt of applications by 24% and reduces the Soft-Error Rate (SER) by 12% compared to the best static execution.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129510182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Using M/G/l queueing models with vacations to analyze virtualized logic computations 利用带假期的M/G/l排队模型分析虚拟逻辑计算

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357087

Michael J. Hall, R. Chamberlain

引用次数: 4

Effective hardware-level thread synchronization for high performance and power efficiency in application specific multi-threaded embedded processors 在特定于应用程序的多线程嵌入式处理器中实现高效的硬件级线程同步

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357119

M. Wickramasinghe, Hui Guo

{"title":"Effective hardware-level thread synchronization for high performance and power efficiency in application specific multi-threaded embedded processors","authors":"M. Wickramasinghe, Hui Guo","doi":"10.1109/ICCD.2015.7357119","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357119","url":null,"abstract":"Multi-threaded processors interleave the execution of several threads to reduce processor stalling time. Instruction cache misses usually account for a significant fraction of the overall stalling time due to frequent instruction fetches. Apart from incurring extended execution time (hence its direct impact on energy consumption), cache misses also lead to indirect power overheads and increased thread switching due to resulting main memory accesses. Therefore, minimizing instruction cache misses is important especially in designing application specific embedded processors that tend to be compact in size and consume low power. This paper aims to reduce instruction cache misses in a single pipeline processor for applications that offer embarrassing parallelism and enable the same code to be executed by a number of independent threads on different data sets. Such a design can be used as a building block processor for large multicomputer systems. We propose a micro-architectural level multithreading control design, which synchronizes the thread execution to allow cached instructions to be maximally reused by all threads. Our experiments show that our design not only increases the pipeline performance but also reduces the memory access frequency, hence effectively achieving high energy efficiency.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123499002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A fast and energy efficient branch and bound algorithm for NoC task mapping 一种快速节能的NoC任务映射分支定界算法

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357078

Jiashen Li, Yun Pan

引用次数: 5

Hybrid scratchpad and cache memory management for energy-efficient parallel HEVC encoding 高效并行HEVC编码的混合刮擦板和缓存存储器管理

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357185

Changlai Song, Lei Ju, Zhiping Jia

{"title":"Hybrid scratchpad and cache memory management for energy-efficient parallel HEVC encoding","authors":"Changlai Song, Lei Ju, Zhiping Jia","doi":"10.1109/ICCD.2015.7357185","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357185","url":null,"abstract":"The next-generation video coding standard High Efficiency Video Coding (HEVC) provides better compression rates for high resolution videos compared with H.264, at the cost of significantly increased needs for computation power and memory bandwidth. Therefore, memory subsystem optimization is of paramount importance to support HEVC on resource and energy constrained embedded consumer electronics. In this paper, we present a hybrid on-chip memory architecture with both caches and scratchpad memories (SPMs) for parallel HEVC encoding. A run-time prediction algorithm is proposed to effectively identify the most-frequently accessed memory regions in the search window(s) for processing individual coding tree units (CTUs). Depending on their intra- and inter-core reuses, these regions are loaded into the private or shared SPMs for guaranteed on-chip memory accesses. On the other hand, a relatively small hardware-controlled cache is used for the rest of data accesses. Moreover, an adaptive power gating scheme is proposed to power off SPM sectors with expired load windows to further reduce the on-chip leakage power. Compared with the state-of-the-art solution, experimental results show that our proposed memory management framework supports high speed parallel HEVC processing with substantially smaller on-chip memory size, which achieves up to 76.23% on-chip leakage energy savings, and 33.31% energy saving for the overall memory subsystem.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128177508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

VPM: Virtual power meter tool for low-power many-core/heterogeneous data center prototypes VPM:用于低功耗多核/异构数据中心原型的虚拟功率计工具

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357177

S. Rethinagiri, Oscar Palomar, J. Moreno, O. Unsal, A. Cristal

{"title":"VPM: Virtual power meter tool for low-power many-core/heterogeneous data center prototypes","authors":"S. Rethinagiri, Oscar Palomar, J. Moreno, O. Unsal, A. Cristal","doi":"10.1109/ICCD.2015.7357177","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357177","url":null,"abstract":"Power and energy consumption of data centers are steadily increasing and the work performed by the data centers is not proportional to the power dissipated, where every μA is a revenue for the entity. On the one hand, the hardware community is proposing various methodologies to address this issue such as low-power processors, heterogeneity, etc. to reduce the power of the servers. On the other hand, the software community proposes mechanisms such as virtual machines (VMs), work-load scheduling, etc. to increase the utilization of the processor. In order to properly evaluate the impact of these mechanisms, we need an accurate power monitoring and estimation tool at the hardware host level, the VM level and the system-level. This paper proposes a novel power monitoring middleware on a low-power platform at the node level (ARM Big.LITTLE) and an estimation methodology by using a simulator for future data center prototypes at any given level of virtualization. First, we built an instrumentation framework to measure the power based on hardware counter activities and with respect to current fluctuation. This allows us to build power models for the corresponding platforms, which are fed into the middleware to estimate power on the fly. Furthermore, we used the same framework for future low-power processors such as ARM Cortex-A57 and -A53 based platforms, which are integrated into the architectural simulator by providing an API to estimate power with the power model. Second, we present a machine learning-based energy efficient scheduling of the VMs that leverages VPM. The results obtained with the power monitoring middleware differ less than 2% from real board measurements and 5% when using the simulation environment regardless of the number of virtual machines used. Furthermore, we reduced 40% of energy consumption on average when compared to default scheduling of the KVM hypervisor.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127620629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2