2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献_第2页

RExCache: Rapid exploration of unified last-level cache RExCache:快速探索统一的最后一级缓存

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509661

S. Min, Haris Javaid, S. Parameswaran

{"title":"RExCache: Rapid exploration of unified last-level cache","authors":"S. Min, Haris Javaid, S. Parameswaran","doi":"10.1109/ASPDAC.2013.6509661","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509661","url":null,"abstract":"In this paper, we propose to explore design space of a unified last-level cache to improve system performance and energy efficiency. The challenge is to quickly estimate the execution time and energy consumption of the system with distinct cache configurations using minimal number of slow full-system cycle-accurate simulations. To this end, we propose a novel, simple yet highly accurate execution time estimator and a simple, reasonably accurate energy estimator. Our framework, RExCache, combines a cycle-accurate simulator and a trace-driven cache simulator with our novel execution time estimator and energy estimator to avoid cycle-accurate simulations of all the last-level cache configurations. Once execution time and energy estimates are available from the estimators, RExCache chooses minimum execution time or minimum energy consumption cache configuration. Our experiments with nine different applications from mediabench, and 330 last-level cache configurations show that the execution time and energy estimators had at least average absolute accuracy of 99.74% and 80.31% respectively. RExCache took only a few hours (21 hours for H.264enc) to explore last-level cache configurations compared to several days of traditional method (36 days for H.264enc) and cycle-accurate simulations (257 days for H.264enc), enabling quick exploration of the last-level cache. When 100 different real-time constraints on execution time and energy were used, all the cache configurations found by RExCache were similar to those from cycle-accurate simulations. On the other hand, the traditional method found correct cache configurations for only 69 out of 100 constraints. Thus, RExCache has better absolute accuracy than the traditional method, yet reducing the simulation time by at least 97%.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126892688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Curling-PCM: Application-specific wear leveling for phase change memory based embedded systems Curling-PCM:基于相变存储器的嵌入式系统的专用磨损平衡

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509609

Duo Liu, Tianzheng Wang, Yi Wang, Z. Shao, Qingfeng Zhuge, E. Sha

{"title":"Curling-PCM: Application-specific wear leveling for phase change memory based embedded systems","authors":"Duo Liu, Tianzheng Wang, Yi Wang, Z. Shao, Qingfeng Zhuge, E. Sha","doi":"10.1109/ASPDAC.2013.6509609","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509609","url":null,"abstract":"Phase change memory (PCM) has been used as NOR flash replacement in embedded systems with its attractive features. However, the endurance of PCM keeps drifting down and greatly limits its adoption in embedded systems. As most embedded systems are application-oriented, we can better utilize PCM by exploring application-specific features such as fixed access patterns and update frequencies to prolong the lifetime of PCM. In this paper, we propose an application-specific wear leveling technique, called Curling-PCM, to evenly distribute write activities across the PCM chip in order to improve the endurance of PCM. The basic idea is to exploit application-specific features in embedded systems and periodically move the hot region across the whole PCM chip. To further reduce the overhead of moving the hot region and improve the performance of PCM-based embedded systems, a fine-grained partial wear leveling policy is proposed in Curling-PCM, by which only part of the hot region is moved during each request handling period. The experimental results show that Curling-PCM can effectively evenly distribute write traffic in PCM chips compared with previous work. We expect this work can serve as a first step towards the full exploration of application-specific features in PCM-based embedded systems.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128059359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61

Optimization of overdrive signoff 超速信号优化

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509619

T. Chan, A. Kahng, Jiajia Li, S. Nath

{"title":"Optimization of overdrive signoff","authors":"T. Chan, A. Kahng, Jiajia Li, S. Nath","doi":"10.1109/ASPDAC.2013.6509619","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509619","url":null,"abstract":"In modern SOC implementations, multi-mode design is commonly used to achieve better circuit performance and power across voltage-scaling, “turbo” and other operating modes. Although there are many tools for multi-mode circuit implementation, to our knowledge there is no available systematic analysis or methodology for the selection of associated signoff modes. We observe that the selection of signoff modes has significant impact on circuit area, power and performance. For example, incorrect choice of signoff voltages for required overdrive frequencies can result in a netlist with 15% suboptimality in power or 21% in area. In this paper, we propose a concept of mode dominance which can be used as a guideline for signoff mode selection. Further, we also propose efficient circuit implementation flows to optimize the selection of signoff modes within several distinct use cases. Our results show that our proposed methodology provides 5-7% improvement in performance compared to the traditional “signoff and scale” method. The signoff modes determined by our methods result in only 0.6% overhead in performance and 8% overhead in power after implementation, compared to the optimal signoff modes.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115977268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

High-level synthesis of multiple dependent CUDA kernels on FPGA 基于FPGA的多相关CUDA内核高级合成

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509613

S. Gurumani, Hisham Cholakkal, Yun Liang, K. Rupnow, Deming Chen

{"title":"High-level synthesis of multiple dependent CUDA kernels on FPGA","authors":"S. Gurumani, Hisham Cholakkal, Yun Liang, K. Rupnow, Deming Chen","doi":"10.1109/ASPDAC.2013.6509613","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509613","url":null,"abstract":"High-level synthesis (HLS) tools provide automatic generation of hardware at the register transfer level (RTL) from algorithm descriptions written in high-level languages, enabling faster creation of custom accelerators for FPGA architectures. Existing HLS tools support a wide variety of input languages, and assist users in design space exploration through automation and feedback on designs' performance bottlenecks. This design space exploration applies techniques such as pipelining, partitioning and resource sharing in order to improve performance, and resource utilization. However, although automated exploration can find some inherent parallelism, data-parallel input source code is still superior for exposing a greater variety of parallelism. In prior work, we demonstrated automated design space exploration of GPU multi-threaded (CUDA) language source code for efficient RTL generation. In this paper, we examine the challenges in extending this automated design space exploration to multiple dependent CUDA kernels, demonstrate a step-by-step procedure for efficiently performing multi-kernel synthesis, and demonstrate the potential of this approach through a case study of a stereo matching algorithm. This study demonstrates that HLS of multiple dependent CUDA kernels can maintain performance parity with the GPU implementation, while consuming over 16X less energy than the GPU. Based on our manual procedure, we identify the key challenges in fully automating the synthesis of multi-kernel CUDA programs.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133861229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Unconditionally stable explicit method for the fast 3-D simulation of on-chip power distribution network with through silicon via 片上硅通孔配电网三维快速仿真的无条件稳定显式方法

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509550

T. Sekine, H. Asai

引用次数: 4

MIXSyn: An efficient logic synthesis methodology for mixed XOR-AND/OR dominated circuits MIXSyn:一种用于混合异或与/或控制电路的高效逻辑合成方法

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509585

L. Amarù, P. Gaillardon, G. Micheli

引用次数: 26

Heterogeneous memory management for 3D-DRAM and external DRAM with QoS 具有QoS的3D-DRAM和外部DRAM的异构内存管理

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509676

L. Tran, F. Kurdahi, A. Eltawil, H. Homayoun

{"title":"Heterogeneous memory management for 3D-DRAM and external DRAM with QoS","authors":"L. Tran, F. Kurdahi, A. Eltawil, H. Homayoun","doi":"10.1109/ASPDAC.2013.6509676","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509676","url":null,"abstract":"This paper presents an innovative memory management approach to utilize both 3D-DRAM and external DRAM (ex-DRAM). Our approach dynamically allocates and relocates memory blocks between the 3D-DRAM and the ex-DRAM to exploit the high memory bandwidth and the low memory latency of the 3D-DRAM as well as the high capacity and the low cost of the ex-DRAM. Our simulation shows that in workloads that are not memory intensive, our memory management technique transfers all active memory blocks to the 3D-DRAM which runs faster than the ex-DRAM. In memory intensive workloads, our memory management technique utilizes both the 3D-DRAM and the ex-DRAM to increase the memory bandwidth to alleviate bandwidth congestion. Our approach supports Quality of Service (QoS) for “latency sensitive”, “bandwidth sensitive”, and “insensitive” applications. To improve the performance and satisfy a certain level of QoS, memory blocks of different application types are allocated differently. Compared to the scratchpad memory management mechanism, the average memory access latency of our approach decreases by 19% and 23%, while performance improves by up to 5% and 12% in single threaded benchmarks and multi-threaded benchmarks respectively. Moreover, using our approach, applications do not need to manage memory explicitly like in the scratchpad case. Our memory block relocation comes with negligible performance overhead, particularly for applications which have high spatial memory locality.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121929818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

HS3DPG: Hierarchical simulation for 3D P/G network HS3DPG:三维P/G网络分层仿真

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509647

Shuai Tao, Xiaoming Chen, Yu Wang, Yuchun Ma, Yiyu Shi, Hui Wang, Huazhong Yang

引用次数: 5

Application-specific fault-tolerant architecture synthesis for digital microfluidic biochips 数字微流控生物芯片专用容错架构合成

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509697

M. Alistar, P. Pop, J. Madsen

{"title":"Application-specific fault-tolerant architecture synthesis for digital microfluidic biochips","authors":"M. Alistar, P. Pop, J. Madsen","doi":"10.1109/ASPDAC.2013.6509697","DOIUrl":"https://doi.org/10.1109/ASPDAC.2013.6509697","url":null,"abstract":"Microfluidic-based biochips are replacing the conventional biochemical analyzers, and are able to integrate onchip all the necessary functions for biochemical analysis using microfluidics. The digital microfluidic biochips are based on the manipulation of liquids not as a continuous flow, but as discrete droplets on an array of electrodes. Microfluidic operations, such as transport, mixing, split, are performed on this array by routing the corresponding droplets on a series of electrodes. Researchers have proposed several approaches for the synthesis of digital microfluidic biochips. All previous work assumes that the biochip architecture is given, and most approaches consider a rectangular shape for the electrode array. However, non-regular application-specific architectures are common in practice. Hence, in this paper, we propose an approach to the application-specific architecture synthesis. Our approach can also help the designer to increase the yield by introducing redundant electrodes to tolerate permanent faults. The proposed architecture synthesis algorithm has been evaluated using several benchmarks.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127511846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

A sub-harmonic injection-locked frequency synthesizer with frequency calibration scheme for use in 60GHz TDD transceivers 用于60GHz TDD收发器的带频率校准方案的次谐波注入锁频合成器

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2013-04-29 DOI: 10.1109/ASPDAC.2013.6509574

T. Siriburanon, W. Deng, Ahmed Musa, K. Okada, A. Matsuzawa

引用次数: 1