2015 33rd IEEE International Conference on Computer Design (ICCD)最新文献

筛选
英文 中文
An automated design flow for approximate circuits based on reduced precision redundancy 基于降低精度冗余的近似电路自动设计流程
2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357088
D. J. Pagliari, A. Calimera, E. Macii, M. Poncino
{"title":"An automated design flow for approximate circuits based on reduced precision redundancy","authors":"D. J. Pagliari, A. Calimera, E. Macii, M. Poncino","doi":"10.1109/ICCD.2015.7357088","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357088","url":null,"abstract":"Reduced Precision Redundancy (RPR) is a popular Approximate Computing technique, in which a circuit operated in Voltage Over-Scaling (VOS) is paired to a reduced-bitwidth and faster replica so that VOS-induced timing errors are partially recovered by the replica, and their impact is mitigated. Previous works have provided various examples of effective implementations of RPR, which however suffer from three limitations: first, these circuits are designed using ad-hoc procedures, and no generalization is provided; second, error impact analysis is carried out statistically, thus neglecting issues like non-elementary data distribution and temporal correlation. Last, only dynamic power was considered in the optimization. In this work we propose a new generalized approach to RPR that allows to overcome all these limitations, leveraging the capabilities of state-of-the-art synthesis and simulation tools. By sacrificing theoretical provability in favor of an empirical input-based analysis, we build a design tool able to automatically add RPR to a preexisting gate-level netlist. Thanks to this method, we are able to confute some of the conclusions drawn in previous works, in particular those related to statistical assumptions on inputs; we show that a given inputs distribution may yield extremely different results depending on their temporal behavior.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115011907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Data-driven logic synthesizer for acceleration of Forward propagation in artificial neural networks 加速人工神经网络前向传播的数据驱动逻辑合成器
2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357142
K. Mahmoud, W. E. Smith, Mark Fishkin, Timothy N. Miller
{"title":"Data-driven logic synthesizer for acceleration of Forward propagation in artificial neural networks","authors":"K. Mahmoud, W. E. Smith, Mark Fishkin, Timothy N. Miller","doi":"10.1109/ICCD.2015.7357142","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357142","url":null,"abstract":"We present a tool for automatically generating efficient feed-forward logic for hardware acceleration of artificial neural networks (ANNs). It produces circuitry in the form of synthesizable Verilog code that is optimized based on analyzing training data to minimize the numbers of bits in weights and values, thereby minimizing the number of logic gates in ANN components such as adders and multipliers. For an optimized ANN, different implementation topologies can be generated, including fully pipelined and simple state machines. Additional insights about hardware acceleration for neural networks are also presented. We show the impact of reducing precision relative to floating point and present area, power, delay, throughput, and energy estimates by circuit synthesis.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125512889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms 在异构移动平台上高效执行数据并行应用程序
2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357105
Alok Prakash, Siqi Wang, Alexandru Eugen Irimiea, T. Mitra
{"title":"Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms","authors":"Alok Prakash, Siqi Wang, Alexandru Eugen Irimiea, T. Mitra","doi":"10.1109/ICCD.2015.7357105","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357105","url":null,"abstract":"State-of-the-art mobile system-on-chips (SoC) include heterogeneity in various forms for accelerated and energy-efficient execution of diverse range of applications. The modern SoCs now include programmable cores such as CPU and GPU with very different functionality. The SoCs also integrate performance heterogeneous cores with different power-performance characteristics but the same instruction-set architecture such as ARM big.LITTLE. In this paper, we first explore and establish the combined benefits of functional heterogeneity and performance heterogeneity in improving power-performance behavior of data parallel applications. Next, given an application specified in OpenCL, we present a static partitioning strategy to execute the application kernel across CPU and GPU cores along with voltage-frequency setting for individual cores so as to obtain the best power-performance tradeoff. We achieve over 19% runtime improvement by exploiting the functional and performance heterogeneities concurrently. In addition, energy saving of 36% is achieved by using appropriate voltage-frequency setting without significantly degrading the runtime improvement from concurrent execution.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126630726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
A pre-search assisted ILP approach to analog integrated circuit routing 模拟集成电路布线的预搜索辅助ILP方法
2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357110
Chi-Yu Wu, H. Graeb, Jiang Hu
{"title":"A pre-search assisted ILP approach to analog integrated circuit routing","authors":"Chi-Yu Wu, H. Graeb, Jiang Hu","doi":"10.1109/ICCD.2015.7357110","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357110","url":null,"abstract":"The routing of analog integrated circuits (IC) has long been a challenge due to numerous constraints (such as symmetry and topology-matching) that matter for overall circuit performance. Existing automatic analog IC routing algorithms can be broadly categorized into two approaches: sequential approach that heuristically routes one net after another and constructive ILP (Integer Linear Programming). The former approach is usually fast but may miss opportunities of finding good solutions. The constructive ILP provides optimal solutions but can be very time consuming. We propose a simple yet efficient method that combines the advantages of both existing approaches. First, sequential routing is performed to obtain a set of candidate routing paths for each net. Then, an ILP is applied to commit each net to only one of its candidate routes. Experiments on two op-amp designs show that the post-layout performance (such as gain and phase margin) from our method is close to that of manual design. Our method also outperforms a previous work of automated analog IC routing.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122731825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Online mechanism for reliability and power-efficiency management of a dynamically reconfigurable core 动态可重构核心的可靠性和能效在线管理机制
2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357121
S. Srinivasan, I. Koren, S. Kundu
{"title":"Online mechanism for reliability and power-efficiency management of a dynamically reconfigurable core","authors":"S. Srinivasan, I. Koren, S. Kundu","doi":"10.1109/ICCD.2015.7357121","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357121","url":null,"abstract":"Previous studies have shown that the best way to achieve high throughput/Watt of a single threaded application is by running it on an asymmetric multicore processor (AMP). AMPs feature cores that are tuned for specific workload characteristics. To increase efficiency, the core that offers the best power-performance trade-off for the executing thread is chosen. To reduce the overhead of thread migration, we have previously proposed a morphable core that can morph into multiple core types. In this study, apart from power-performance efficiency, we also consider the reliability of the different core types as indicated by their vulnerability to soft-errors. We show that the best core type for power-efficiency may not be the best for reliability. Accordingly, we develop a multi-objective thread migration strategy to determine the best core type considering power efficiency and reliability. To support runtime decision making, we have developed online estimators for reliability and power efficiency based on performance monitoring counters. In keeping with the existing literature, we use the architectural vulnerability factor (AVF) as the metric for reliability and instructions-per-second2/Watt as the metric for power efficiency. For the multi-objective optimization we use a Cobb-Douglas production function. Our results indicate that the proposed runtime mechanism for reliability and power-efficiency improves, on the average, the throughput/Watt of applications by 24% and reduces the Soft-Error Rate (SER) by 12% compared to the best static execution.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129510182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Using M/G/l queueing models with vacations to analyze virtualized logic computations 利用带假期的M/G/l排队模型分析虚拟逻辑计算
2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357087
Michael J. Hall, R. Chamberlain
{"title":"Using M/G/l queueing models with vacations to analyze virtualized logic computations","authors":"Michael J. Hall, R. Chamberlain","doi":"10.1109/ICCD.2015.7357087","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357087","url":null,"abstract":"Visualization of logic computations (i.e., by sharing a fixed function across distinct data streams) provides a means to effectively utilize hardware resources by context switching the logic to support multiple data streams of computation and to improve the total throughput of all streams. Context switching allows the pipeline stages of the logic to be fully utilized when feedback is present and to support additional contexts using secondary memory. In this paper, we analyze the performance of a virtualized hardware design and develop M/G/1 queueing model equations to predict circuit performance. The server is modeled using a general distribution that takes vacations during the computation of an individual data stream. Using the model, we predict circuit performance and tune a schedule for optimal performance.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130555237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Effective hardware-level thread synchronization for high performance and power efficiency in application specific multi-threaded embedded processors 在特定于应用程序的多线程嵌入式处理器中实现高效的硬件级线程同步
2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357119
M. Wickramasinghe, Hui Guo
{"title":"Effective hardware-level thread synchronization for high performance and power efficiency in application specific multi-threaded embedded processors","authors":"M. Wickramasinghe, Hui Guo","doi":"10.1109/ICCD.2015.7357119","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357119","url":null,"abstract":"Multi-threaded processors interleave the execution of several threads to reduce processor stalling time. Instruction cache misses usually account for a significant fraction of the overall stalling time due to frequent instruction fetches. Apart from incurring extended execution time (hence its direct impact on energy consumption), cache misses also lead to indirect power overheads and increased thread switching due to resulting main memory accesses. Therefore, minimizing instruction cache misses is important especially in designing application specific embedded processors that tend to be compact in size and consume low power. This paper aims to reduce instruction cache misses in a single pipeline processor for applications that offer embarrassing parallelism and enable the same code to be executed by a number of independent threads on different data sets. Such a design can be used as a building block processor for large multicomputer systems. We propose a micro-architectural level multithreading control design, which synchronizes the thread execution to allow cached instructions to be maximally reused by all threads. Our experiments show that our design not only increases the pipeline performance but also reduces the memory access frequency, hence effectively achieving high energy efficiency.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123499002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A fast and energy efficient branch and bound algorithm for NoC task mapping 一种快速节能的NoC任务映射分支定界算法
2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357078
Jiashen Li, Yun Pan
{"title":"A fast and energy efficient branch and bound algorithm for NoC task mapping","authors":"Jiashen Li, Yun Pan","doi":"10.1109/ICCD.2015.7357078","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357078","url":null,"abstract":"This paper proposes an enhanced Branch and Bound (B&B) algorithm for Network-on-Chip (NoC) task mapping. The novelty of the algorithm can be summarized in two aspects. First, a more accurate method is proposed to estimate the lower bound cost. Second, an automatic method to generate the task binding rules is proposed based on the Task Binding Graph (TBG). Both of the two improvements contribute to designing a high speed B&B algorithm with global optimized mapping result, aiming to reduce the communication energy consumption. The experiment results show that the proposed algorithm is nearly 3.5 times faster and the communication energy consumption is 35% less than the state-of-art B&B algorithm in average. Comparing to the Genetic Algorithm, the proposed algorithm is similarly fast and reduce the communication energy consumption by 24% in average. Particularly, as the size of the NoC grows larger, the superiorities of our proposed algorithm become more significant.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128038311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Hybrid scratchpad and cache memory management for energy-efficient parallel HEVC encoding 高效并行HEVC编码的混合刮擦板和缓存存储器管理
2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357185
Changlai Song, Lei Ju, Zhiping Jia
{"title":"Hybrid scratchpad and cache memory management for energy-efficient parallel HEVC encoding","authors":"Changlai Song, Lei Ju, Zhiping Jia","doi":"10.1109/ICCD.2015.7357185","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357185","url":null,"abstract":"The next-generation video coding standard High Efficiency Video Coding (HEVC) provides better compression rates for high resolution videos compared with H.264, at the cost of significantly increased needs for computation power and memory bandwidth. Therefore, memory subsystem optimization is of paramount importance to support HEVC on resource and energy constrained embedded consumer electronics. In this paper, we present a hybrid on-chip memory architecture with both caches and scratchpad memories (SPMs) for parallel HEVC encoding. A run-time prediction algorithm is proposed to effectively identify the most-frequently accessed memory regions in the search window(s) for processing individual coding tree units (CTUs). Depending on their intra- and inter-core reuses, these regions are loaded into the private or shared SPMs for guaranteed on-chip memory accesses. On the other hand, a relatively small hardware-controlled cache is used for the rest of data accesses. Moreover, an adaptive power gating scheme is proposed to power off SPM sectors with expired load windows to further reduce the on-chip leakage power. Compared with the state-of-the-art solution, experimental results show that our proposed memory management framework supports high speed parallel HEVC processing with substantially smaller on-chip memory size, which achieves up to 76.23% on-chip leakage energy savings, and 33.31% energy saving for the overall memory subsystem.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128177508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
VPM: Virtual power meter tool for low-power many-core/heterogeneous data center prototypes VPM:用于低功耗多核/异构数据中心原型的虚拟功率计工具
2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357177
S. Rethinagiri, Oscar Palomar, J. Moreno, O. Unsal, A. Cristal
{"title":"VPM: Virtual power meter tool for low-power many-core/heterogeneous data center prototypes","authors":"S. Rethinagiri, Oscar Palomar, J. Moreno, O. Unsal, A. Cristal","doi":"10.1109/ICCD.2015.7357177","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357177","url":null,"abstract":"Power and energy consumption of data centers are steadily increasing and the work performed by the data centers is not proportional to the power dissipated, where every μA is a revenue for the entity. On the one hand, the hardware community is proposing various methodologies to address this issue such as low-power processors, heterogeneity, etc. to reduce the power of the servers. On the other hand, the software community proposes mechanisms such as virtual machines (VMs), work-load scheduling, etc. to increase the utilization of the processor. In order to properly evaluate the impact of these mechanisms, we need an accurate power monitoring and estimation tool at the hardware host level, the VM level and the system-level. This paper proposes a novel power monitoring middleware on a low-power platform at the node level (ARM Big.LITTLE) and an estimation methodology by using a simulator for future data center prototypes at any given level of virtualization. First, we built an instrumentation framework to measure the power based on hardware counter activities and with respect to current fluctuation. This allows us to build power models for the corresponding platforms, which are fed into the middleware to estimate power on the fly. Furthermore, we used the same framework for future low-power processors such as ARM Cortex-A57 and -A53 based platforms, which are integrated into the architectural simulator by providing an API to estimate power with the power model. Second, we present a machine learning-based energy efficient scheduling of the VMs that leverages VPM. The results obtained with the power monitoring middleware differ less than 2% from real board measurements and 5% when using the simulation environment regardless of the number of virtual machines used. Furthermore, we reduced 40% of energy consumption on average when compared to default scheduling of the KVM hypervisor.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127620629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信