2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)最新文献

筛选
英文 中文
Invited: Cross-layer approximate computing: From logic to architectures 邀请:跨层近似计算:从逻辑到架构
2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2016-06-05 DOI: 10.1145/2897937.2906199
M. Shafique, R. Hafiz, Semeen Rehman, Walaa El-Harouni, J. Henkel
{"title":"Invited: Cross-layer approximate computing: From logic to architectures","authors":"M. Shafique, R. Hafiz, Semeen Rehman, Walaa El-Harouni, J. Henkel","doi":"10.1145/2897937.2906199","DOIUrl":"https://doi.org/10.1145/2897937.2906199","url":null,"abstract":"We present a survey of approximate techniques and discuss concepts for building power-/energy-efficient computing components reaching from approximate accelerators to arithmetic blocks (like adders and multipliers). We provide a systematical understanding of how to generate and explore the design space of approximate components, which enables a wide-range of power/energy, performance, area and output quality tradeoffs, and a high degree of design flexibility to facilitate their design. To enable cross-layer approximate computing, bridging the gap between the logic layer (i.e. arithmetic blocks) and the architecture layer (and even considering the software layers) is crucial. Towards this end, this paper introduces open-source libraries of low-power and high-performance approximate components. The elementary approximate arithmetic blocks (adder and multiplier) are used to develop multi-bit approximate arithmetic blocks and accelerators. An analysis of data-driven resilience and error propagation is discussed. The approximate computing components are a first steps towards a systematic approach to introduce approximate computing paradigms at all levels of abstractions.","PeriodicalId":185271,"journal":{"name":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123205250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 173
A model-driven approach to warp/thread-block level GPU cache bypassing 一个模型驱动的方法,曲/线程块级GPU缓存绕过
2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2016-06-05 DOI: 10.1145/2897937.2897966
Hongwen Dai, C. Li, Huiyang Zhou, Saurabh Gupta, Christos Kartsaklis, Mike Mantor
{"title":"A model-driven approach to warp/thread-block level GPU cache bypassing","authors":"Hongwen Dai, C. Li, Huiyang Zhou, Saurabh Gupta, Christos Kartsaklis, Mike Mantor","doi":"10.1145/2897937.2897966","DOIUrl":"https://doi.org/10.1145/2897937.2897966","url":null,"abstract":"The high amount of memory requests from massive threads may easily cause cache contention and cache-miss-related resource congestion on GPUs. This paper proposes a simple yet effective performance model to estimate the impact of cache contention and resource congestion as a function of the number of warps/thread blocks (TBs) to bypass the cache. Then we design a hardware-based dynamic warp/thread-block level GPU cache bypassing scheme, which achieves 1.68x speedup on average on a set of memory-intensive benchmarks over the baseline. Compared to prior works, our scheme achieves 21.6% performance improvement over SWL-best [29] and 11.9% over CBWT-best [4] on average.","PeriodicalId":185271,"journal":{"name":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121207072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
TEMP: Thread batch enabled memory partitioning for GPU 为GPU启用线程批处理内存分区
2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2016-06-05 DOI: 10.1145/2897937.2898103
Mengjie Mao, Wujie Wen, Xiaoxiao Liu, J. Hu, Danghui Wang, Yiran Chen, Hai Helen Li
{"title":"TEMP: Thread batch enabled memory partitioning for GPU","authors":"Mengjie Mao, Wujie Wen, Xiaoxiao Liu, J. Hu, Danghui Wang, Yiran Chen, Hai Helen Li","doi":"10.1145/2897937.2898103","DOIUrl":"https://doi.org/10.1145/2897937.2898103","url":null,"abstract":"As massive multi-threading in GPU imposes tremendous pressure on memory subsystems, efficient bandwidth utilization becomes a key factor affecting the GPU throughput. In this work, we propose thread batch enabled memory partitioning (TEMP), to improve GPU performance through the improvement of memory bandwidth utilization. In particular, TEMP clusters multiple thread blocks sharing the same set of pages into a thread batch and dispatches the entire thread batch to a stream multiprocessor. TEMP separates the memory access streams of different thread batches by OS memory management, preserving the intrinsic locality of thread batches and increasing the memory access parallelism. Experimental results show that TEMP can obtain up to 10.3% performance improvement and 14.6% DRAM energy reduction compared to a state-of-the-art scheduler without any memory-side optimizations.","PeriodicalId":185271,"journal":{"name":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116280118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Probabilistic bug-masking analysis for post-silicon tests in microprocessor verification 微处理器验证中后硅测试的概率bug屏蔽分析
2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2016-06-05 DOI: 10.1145/2897937.2898072
Doowon Lee, Tom Kolan, A. Morgenshtein, V. Sokhin, Ronny Morad, A. Ziv, V. Bertacco
{"title":"Probabilistic bug-masking analysis for post-silicon tests in microprocessor verification","authors":"Doowon Lee, Tom Kolan, A. Morgenshtein, V. Sokhin, Ronny Morad, A. Ziv, V. Bertacco","doi":"10.1145/2897937.2898072","DOIUrl":"https://doi.org/10.1145/2897937.2898072","url":null,"abstract":"Post-silicon validation has become essential in catching hard-to-detect, rarely-occurring bugs that have slipped through pre-silicon verification. Post-silicon validation flows, however, are challenged by limited signal observability, which impacts their ability of diagnosing and detecting bugs. Indeed, bug manifestations during the execution of constrained-random tests may be masked and be unobservable from the test's outputs. The ability to evaluate the bug-masking rate of a test provides great value in generating and/or selecting effective tests for high coverage regressions. To this end, we propose an efficient, static bug-masking analysis solution, called BugMAPI. BugMAPI tracks the information flow in a test program, and it estimates the probability that bugs go undetected by the checking mechanisms in place in the post-silicon platform. To achieve this goal, we leverage static code analysis and a novel, lightweight, probability estimation algorithm. We evaluated BugMAPI on a range of industrial constrained-random tests and a range of bug injection models, and we found that it can estimate bugmasking rates with an accuracy of 77% in 3 orders-of-magnitude less time, compared to an ideal dynamic analysis solution.","PeriodicalId":185271,"journal":{"name":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114767576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Quest for high-performance bufferless NoCs with single-cycle express paths and self-learning throttling 探索具有单周期表达路径和自学习节流的高性能无缓冲noc
2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2016-06-05 DOI: 10.1145/2897937.2898075
Bhavya K. Daya, L. Peh, A. Chandrakasan
{"title":"Quest for high-performance bufferless NoCs with single-cycle express paths and self-learning throttling","authors":"Bhavya K. Daya, L. Peh, A. Chandrakasan","doi":"10.1145/2897937.2898075","DOIUrl":"https://doi.org/10.1145/2897937.2898075","url":null,"abstract":"Router buffers are the main reason for the Network-on-Chip's (NoC) scalable bandwidth, but consumes significant area and power. The SCEPTER bufferless NoC sets up single-cycle virtual express paths dynamically, allowing packets to traverse non-minimal paths without latency penalty. Using prioritization, bypassing, and throttling mechanisms, we maximize opportunities to use these paths while pushing bandwidth. For 64 and 256 nodes, we achieve 62% lower latency, 1.3× higher throughput, and 35% lower starvation over a baseline bufferless NoC for synthetic traffic. Full-system 36-core simulations show a 19% lower runtime, on-par performance to a buffered network, with 36% lower area, 33% lower power.","PeriodicalId":185271,"journal":{"name":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121633138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Invited — A box of dots: Using scan-based path delay test for timing verification 邀请-一盒点:使用基于扫描的路径延迟测试进行时序验证
2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2016-06-05 DOI: 10.1145/2897937.2905001
A. Crouch, John C. Potter
{"title":"Invited — A box of dots: Using scan-based path delay test for timing verification","authors":"A. Crouch, John C. Potter","doi":"10.1145/2897937.2905001","DOIUrl":"https://doi.org/10.1145/2897937.2905001","url":null,"abstract":"In this paper, we describe the use of manufacturing scan-based vectors to structurally assess the frequency of any given semiconductor design, as opposed to the complex and costly effort of creating a functional set of vectors that can actually exercise all of the functions needed to accurately determine if the chip really operates at its rated or advertised frequency. Structural techniques reduce the problem to one of a finite measureable and deterministic set of tests whereas functional vectors can be somewhat subjective unless analyzed, simulated and assessed. The techniques developed and described here were developed on microprocessor designs and were then expanded to cover the general case of an ASIC, SoC, and even FPGA by using static timing analysis, automatic test pattern generation (ATPG) against a path-delay fault model, path selection from STA and using path filtering to eliminate false-paths that would result in an incorrect frequency assessment.","PeriodicalId":185271,"journal":{"name":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132031940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Invited: Towards fail-operational Ethernet based in-vehicle networks 邀请:基于故障操作以太网的车载网络
2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2016-06-05 DOI: 10.1145/2897937.2905021
Mischa Möstl, Daniel Thiele, R. Ernst
{"title":"Invited: Towards fail-operational Ethernet based in-vehicle networks","authors":"Mischa Möstl, Daniel Thiele, R. Ernst","doi":"10.1145/2897937.2905021","DOIUrl":"https://doi.org/10.1145/2897937.2905021","url":null,"abstract":"In the future, vehicles are expected to act more and more autonomously. The transition towards highly automated and autonomous driving will push the safety requirements for in-vehicle networks. Such networks must support isolation between mixed-critical traffic (e.g. critical control and non-critical infotainment) and must be fail-operational. This paper will present new concepts and mechanisms to achieve these goals in Ethernet-based networks. It will cover advanced topics such as software defined networking (SDN) to implement isolation, fault recovery, and controlled degradation, e.g. to maintain (degraded) operation until the driver takes over or to reach a safe stop.","PeriodicalId":185271,"journal":{"name":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132556553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Architecting energy-efficient STT-RAM based register file on GPGPUs via delta compression 通过增量压缩在gpgpu上构建基于STT-RAM的节能寄存器文件
2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2016-06-05 DOI: 10.1145/2897937.2897989
Hang Zhang, Xuhao Chen, Nong Xiao, Fang Liu
{"title":"Architecting energy-efficient STT-RAM based register file on GPGPUs via delta compression","authors":"Hang Zhang, Xuhao Chen, Nong Xiao, Fang Liu","doi":"10.1145/2897937.2897989","DOIUrl":"https://doi.org/10.1145/2897937.2897989","url":null,"abstract":"To facilitate efficient context switches, GPUs usually employ a large-capacity register file to accommodate a massive amount of context information. However, the large register file introduces high power consumption, owing to high leakage power SRAM cells. Emerging non-volatile STT-RAM memory has recently been studied as a potential replacement to alleviate the leakage challenge when constructing register files on GPUs. Unfortunately, due to the long write latency and high energy consumption associated with write operations in STT-RAM, simply replacing SRAM with STT-RAM for register files would incur non-trivial performance overhead and only bring marginal energy benefits. In this paper, we propose to optimize STT-RAM based GPU register files for better energy-efficiency and performance via two techniques. First, we employ a light-weight compression framework with awareness of register value similarity. It is coupled with a group-based write driver control to mitigate the high energy overhead caused by STT-RAM writes. Second, to address the long write latency overhead of STT-RAM, we propose a centralized SRAM-based write buffer design to efficiently absorb STT-RAM writes with better buffer utilization, rather than the conventional design with distributed per-bank based write buffers. The experimental results show that our STT-RAM based register file design consumes only 37.4% energy over the SRAM baseline, while incurring only negligible performance degradation.","PeriodicalId":185271,"journal":{"name":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134449739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
High-level synthesis for micro-electrode-dot-array digital microfluidic biochips 微电极点阵列数字微流控生物芯片的高水平合成
2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2016-06-05 DOI: 10.1145/2897937.2898028
Zipeng Li, Kelvin Yi-Tse Lai, Po-Hsien Yu, Tsung-Yi Ho, K. Chakrabarty, Chen-Yi Lee
{"title":"High-level synthesis for micro-electrode-dot-array digital microfluidic biochips","authors":"Zipeng Li, Kelvin Yi-Tse Lai, Po-Hsien Yu, Tsung-Yi Ho, K. Chakrabarty, Chen-Yi Lee","doi":"10.1145/2897937.2898028","DOIUrl":"https://doi.org/10.1145/2897937.2898028","url":null,"abstract":"A digital microfluidic biochip (DMFB) is an attractive technology platform for automating laboratory procedures in biochemistry. However, today's DMFBs suffer from several limitations: (i) constraints on droplet size and the inability to vary droplet volume in a fine-grained manner; (ii) the lack of integrated sensors for real-time detection; (iii) the need for special fabrication processes and reliability/yield concerns. To overcome the above problems, DMFBs based on a micro-electrode-dot-array (MEDA) architecture have recently been demonstrated. However, due to the inherent differences between today's DMFBs and MEDA, existing synthesis solutions cannot be utilized for MEDA-based biochips. We present the first biochip synthesis approach that can be used for MEDA. The proposed synthesis method targets operation scheduling, module placement, routing of droplets of various sizes, and diagonal movement of droplets in a two-dimensional array. Simulation results using benchmarks and experimental results using a fabricated MEDA biochip demonstrate the effectiveness of the proposed co-optimization technique.","PeriodicalId":185271,"journal":{"name":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131813623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Spectral graph sparsification in nearly-linear time leveraging efficient spectral perturbation analysis 利用有效的谱摄动分析,在近线性时间内实现谱图稀疏化
2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2016-06-05 DOI: 10.1145/2897937.2898094
Zhuo Feng
{"title":"Spectral graph sparsification in nearly-linear time leveraging efficient spectral perturbation analysis","authors":"Zhuo Feng","doi":"10.1145/2897937.2898094","DOIUrl":"https://doi.org/10.1145/2897937.2898094","url":null,"abstract":"Spectral graph sparsification aims to find an ultra-sparse subgraph whose Laplacian matrix can well approximate the original Laplacian matrix in terms of its eigenvalues and eigenvectors. The resultant sparsified subgraph can be efficiently leveraged as a proxy in a variety of numerical computation applications and graph-based algorithms. This paper introduces a practically efficient, nearly-linear time spectral graph sparsification algorithm that can immediately lead to the development of nearly-linear time symmetric diagonally-dominant (SDD) matrix solvers. Our spectral graph sparsi-fication algorithm can efficiently build an ultra-sparse subgraph from a spanning tree subgraph by adding a few “spectrally-critical” off-tree edges back to the spanning tree, which is enabled by a novel spectral perturbation approach and allows to approximately preserve key spectral properties of the original graph Laplacian. Extensive experimental results confirm the nearly-linear runtime scalability of an SDD matrix solver for large-scale, real-world problems, such as VLSI, thermal and finite-element analysis problems, etc. For instance, a sparse SDD matrix with 40 million unknowns and 180 million nonzeros can be solved (1E-3 accuracy level) within two minutes using a single CPU core and about 6GB memory.","PeriodicalId":185271,"journal":{"name":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131121139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信