Proceedings of the 39th International Conference on Computer-Aided Design最新文献

筛选
英文 中文
AxHLS
Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415732
Jorge Castro-Godínez, Julián Mateus-Vargas, M. Shafique, Jörg Henkel
{"title":"AxHLS","authors":"Jorge Castro-Godínez, Julián Mateus-Vargas, M. Shafique, Jörg Henkel","doi":"10.1145/3400302.3415732","DOIUrl":"https://doi.org/10.1145/3400302.3415732","url":null,"abstract":"With the emergence of approximate computing as a design paradigm, many approximate functional units have been proposed, particularly approximate adders and multipliers. These circuits compromise the accuracy of their results within a tolerable limit to reduce the required computational effort and energy requirements. However, for an ongoing number of such approximate circuits reported in the literature, selecting those that minimize the required resources for designing and generating an approximate accelerator from a high-level specification, while satisfying a defined accuracy constraint, is a joint high-level synthesis (HLS) and design space exploration (DSE) challenge. In this paper, we propose a novel automated framework for HLS of approximate accelerators using a given library of approximate functional units. Since repetitive circuit synthesis and gate-level simulations require a significant amount of time, to enable our framework, we present AxME, a set of analytical models for estimating the required computational resources when using approximate adders and multipliers in approximate designs. We propose DSEwam, a DSE methodology for error-tolerant applications, in which analytical models, such as AxME, are used to estimate resources needed and the accuracy of approximate designs. Furthermore, we integrate DSEwam into an HLS tool to automatically generate Pareto-optimal, or near Pareto-optimal, approximate accelerators from C language descriptions, for a given error threshold and minimization goal. We release our DSE framework as an open-source contribution, which will significantly boost the research and development in the field of automatic generation of approximate accelerators.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126557070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
fuseGNN
Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415610
Zhaodong Chen, Mingyu Yan, Maohua Zhu, Lei Deng, Guoqi Li, Shuangchen Li, Yuan Xie
{"title":"fuseGNN","authors":"Zhaodong Chen, Mingyu Yan, Maohua Zhu, Lei Deng, Guoqi Li, Shuangchen Li, Yuan Xie","doi":"10.1145/3400302.3415610","DOIUrl":"https://doi.org/10.1145/3400302.3415610","url":null,"abstract":"Graph convolutional neural networks (GNN) have achieved state-of-the-art performance on tasks like node classification. It has become a new workload family member in data-centers. GNN works on irregular graph-structured data with three distinct phases: Combination, Graph Processing, and Aggregation. While Combination phase has been well supported by sgemm kernels in cuBLAS, the other two phases are still inefficient on GPGPU due to the lack of optimized CUDA kernels. In particular, Aggregation phase introduces large volume of DRAM storage footprint and data movement, and both Aggregation and Graph Processing phases suffer from high kernel launching time. These inefficiencies not only decrease training throughput but also limit users from training GNNs on larger graphs on GPGPU. Although these problems have been partially alleviated by recent studies, their optimizations are still not sufficient. In this paper, we propose fuseGNN, an extension of PyTorch that provides highly optimized APIs and CUDA kernels for GNN. First, two different programming abstractions for Aggregation phase are utilized to handle graphs with different average degrees. Second, dedicated GPGPU kernels are developed for Aggregation and Graph Processing in both forward and backward passes, in which kernel-fusion along with other optimization strategies are applied to reduce kernel launching time and latency as well as exploit data reuse opportunities. Evaluation on multiple benchmarks shows that fuseGNN achieves up to 5.3× end-to-end speedup over state-of-the-art frameworks, and the DRAM storage footprint is reduced by several orders of magnitude on large datasets.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122693794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
PathDriver PathDriver
Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415725
Xing Huang, Youlin Pan, Grace Li Zhang, Bing Li, Wenzhong Guo, Tsung-Yi Ho, Ulf Schlichtmann
{"title":"PathDriver","authors":"Xing Huang, Youlin Pan, Grace Li Zhang, Bing Li, Wenzhong Guo, Tsung-Yi Ho, Ulf Schlichtmann","doi":"10.1145/3400302.3415725","DOIUrl":"https://doi.org/10.1145/3400302.3415725","url":null,"abstract":"Continuous-flow microfluidic biochips have attracted high research interest over the past years. Inside such a chip, fluid samples of milliliter volumes are efficiently transported between devices (e.g., mixers, etc.) to automatically perform various laboratory procedures in biology and biochemistry. Each transportation task, however, requires an exclusive flow path composed of multiple contiguous microchannels during its execution period. Excess/waste fluids, in the meantime, should be discarded by independent flow paths connected to waste ports. All these paths are etched in a very tiny chip area using multilayer soft lithography and driven by flow ports connecting with external pressure sources, forming a highly integrated chip architecture that dominates the performance of biochips. In this paper, we propose a practical synthesis flow called PathDriver for the design automation of microfluidic biochips, integrating the actual fluid manipulations into both high-level synthesis and physical design, which has never been considered in prior work. Given the protocols of biochemical applications, PathDriver aims to generate highly efficient chip architectures with a flow-path network that enables the manipulation of actual fluid transportation and removal. Additionally, fluid volume management between devices and flow-path minimization are realized for the first time, thus ensuring the correctness of assay outcomes while reducing the complexity of chip architectures. Experimental results on multiple benchmarks demonstrate the effectiveness of the proposed synthesis flow.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116675314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SWIPE 刷卡
Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415642
Sujan Kumar Gonugondla, Ameya D. Patil, Naresh R Shanbhag
{"title":"SWIPE","authors":"Sujan Kumar Gonugondla, Ameya D. Patil, Naresh R Shanbhag","doi":"10.1145/3400302.3415642","DOIUrl":"https://doi.org/10.1145/3400302.3415642","url":null,"abstract":"Crossbar-based in-memory architectures have emerged as an attractive platform for energy-efficient realization of deep neural networks (DNNs). A key challenge in such architectures is achieving accurate and efficient writes due to the presence of bitcell conductance variations. In this paper, we propose the Single-Write In-memory Program-vErify (SWIPE) method that achieves high accuracy writes for crossbar-based in-memory architectures at 5×-to-10× lower cost than standard program-verify methods. SWIPE leverages the bit-sliced attribute of crossbar-based in-memory architectures and the statistics of conductance variations to compensate for device non-idealities. Using SWIPE to write into ReRAM crossbar allows for a 2× (CIFAR-10) and 3× (MNIST) increase in storage density with < 1% loss in DNN accuracy. In particular, SWIPE compensates for 4.8×-to-7.7× higher conductance variations. Furthermore, SWIPE can be augmented with injection-based training methods in order to achieve even greater enhancements in robustness.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127421350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
ReTransformer ReTransformer
Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415640
Xiaoxuan Yang, Bonan Yan, Hai Li, Yiran Chen
{"title":"ReTransformer","authors":"Xiaoxuan Yang, Bonan Yan, Hai Li, Yiran Chen","doi":"10.1145/3400302.3415640","DOIUrl":"https://doi.org/10.1145/3400302.3415640","url":null,"abstract":"Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"2673 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114666314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
COALA
Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415721
Yun-Jhe Jiang, Shao-Yun Fang
{"title":"COALA","authors":"Yun-Jhe Jiang, Shao-Yun Fang","doi":"10.1145/3400302.3415721","DOIUrl":"https://doi.org/10.1145/3400302.3415721","url":null,"abstract":"Two-dimensional (2D) global routing followed by layer assignment is a common and popular strategy to obtain a good trade-off between runtime and routing performance. Yet, the huge gap between 2D routing patterns and the final 3D routing paths often results in inevitable overflow after layer assignment. State-of-the-art studies on layer assignment usually adopt dynamic programming-based approaches to sequentially find an optimal solution for each net in terms of overflow or/and the number of vias. However, a fixed assignment ordering severely restricts the solution space, and the distributed overflows can hardly be resolved with any existing refinement approach. This paper proposes a novel layer assignment framework that concurrently considers all the wire segments of nets and iteratively assigns them from the lowest available layer to the highest one. The concurrent scheme facilitates the maximal utilization of routing resource on each layer, contributing to an effective re-routing procedure that greatly reduces inevitable overflows. Experimental results show that compared to the sequential layer assignment solutions that also refined by the same re-routing procedure, the proposed framework can averagely reduce the maximum overflow in a tile by 32% and reduce the number of tiles with overflows by 28% with much less runtime, which shows the significant advantage of concurrent layer assignment over sequential methods.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128092321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Problem C: GPU accelerated logic re-simulation 问题C: GPU加速逻辑重新模拟
Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415740
Yanqing Zhang, Haoxing Ren, Ben Keller, Brucek Khailany
{"title":"Problem C: GPU accelerated logic re-simulation","authors":"Yanqing Zhang, Haoxing Ren, Ben Keller, Brucek Khailany","doi":"10.1145/3400302.3415740","DOIUrl":"https://doi.org/10.1145/3400302.3415740","url":null,"abstract":"Logic \"re\"-simulation can be defined as gate level simulation where the input waveforms at every primary input and pseudo-primary input (such as register/RAM outputs) are known. Such waveforms could come from the unit's RTL simulation trace or Automatic Test Pattern Generation (ATPG) vectors. This type of simulation is useful in doing functional verification on gate level netlists and power analysis, since we can take the known trace on all primary and pseudo-primary inputs, re-simulate the trace using propagation of signals through timing-aware gate-level combinational logic, and verify that results at the primary and pseudo-primary outputs match the reference RTL simulation results. However, gate level simulation is usually much slower than RTL simulation. Thus, there is motivation for faster solutions. In this contest, we ask contestants to use Graphic Processing Units (GPUs) to speedup the re-simulation task.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131477380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
i TPlace
Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415613
Tai-Cheng Lee, Chenghan. Yang, Yih-Lang Li
{"title":"i\u0000 TPlace","authors":"Tai-Cheng Lee, Chenghan. Yang, Yih-Lang Li","doi":"10.1145/3400302.3415613","DOIUrl":"https://doi.org/10.1145/3400302.3415613","url":null,"abstract":"Cell layout synthesis is a critical stage in modern digital IC design. In previous automatic synthesis solutions, algorithms always consider only cell area and routability. This is the first work to propose a method of delay-aware transistor placement for cell library synthesis at the sign-off level. We consider the delay and area of a cell in the transistor placement stage. Our methodology consists of three major steps. First, a search tree finds the candidate placement list that has the smallest area in a large search space. Then, a neural network filters out the unroutable candidates. Finally, a comparative convolutional neural network model, trained by sign-off level data, sorts the delays during the early placement stage. The experimental results show that the proposed CNN-based routable classifier can achieve up to 98% accuracy, and the proposed CNN-based delay ranker also can achieve up to 94.6% accuracy. The work obtains a 1.77% average sequential component delay improvement over the traditional cell synthesis method. Our method also has a 0.97% better delay performance than the human-level design.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114513104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ASAP
Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415626
Yi-Chen Chang, Hongjia Li, Olivia Chenht, Yanzhi Wang, N. Yoshikawa, Tsung-Yi Ho
{"title":"ASAP","authors":"Yi-Chen Chang, Hongjia Li, Olivia Chenht, Yanzhi Wang, N. Yoshikawa, Tsung-Yi Ho","doi":"10.1145/3400302.3415626","DOIUrl":"https://doi.org/10.1145/3400302.3415626","url":null,"abstract":"Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with very low energy dissipation. Each AQFP cell is driven by AC-power to serve as both power supply and clock signal. The clock signals trigger the data flow from one clock phase to the next clock phase, and the delay for each output in the same phase has to be equal. At the same time, the signal current attenuates as the wire becomes longer. When a wire exceeds a maximum length, the weak current causes incorrect data. Thus, rows of buffers have to be inserted as repeaters to satisfy both delay synchronization and wirelength constraint. These inserted buffers significantly increase the power consumption and also the total delay of AQFP circuits. In this paper, we propose an analytical strategy for AQFP placement (ASAP) to provide effective placement results that greatly reduce the number of additional inserted buffers. ASAP includes two main characteristics: 1) a new wire-length function for analytical global placement and 2) detailed placement including fixed-order Lagrangian relaxation and cell balancing algorithm. Experimental results show the efficiency of ASAP framework and a 53% reduction of buffers over the state-of-the-art method.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124153460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
F2VD
Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415716
Kecheng Yang, Ashikahmed Bhuiyan, Zhishan Guo
{"title":"F2VD","authors":"Kecheng Yang, Ashikahmed Bhuiyan, Zhishan Guo","doi":"10.1145/3400302.3415716","DOIUrl":"https://doi.org/10.1145/3400302.3415716","url":null,"abstract":"Increasingly complex and integrated systems design has led to more timing uncertainty, which may result in pessimism in time-sensitive system design and analysis. To mitigate such pessimism, mixed-criticality (MC) design for real-time systems has been proposed, where highly critical tasks, often with extremely pessimistic execution time estimates, can share the processor with less critical ones in a manner that the latter is sacrificed, completely or partially, to guarantee temporal correctness to the former, when the extremely pessimistic scenario does happen. In contrast to such sacrifice of tasks, the precise MC scheduling model has recently been investigated, where all tasks, including less critical ones, must fully complete their execution in all circumstances. Meanwhile, the processor may operate at a degraded speed when the tasks' runtime behaviors are far from the extreme pessimistic estimates and would recover to the full processing speed once the extremely pessimistic scenario does happen. This paper presents a generalized fluid-scheduling-based solution to this problem, where feasible fluid-scheduling rates for each task are derived from an optimization problem. Furthermore, this paper proposes a novel algorithm F2VD for setting virtual deadlines from any feasible fluid rates, such that any fluid-scheduling-based solution can be converted to a deadline-based scheduling approach with no schedulability loss, where the latter is generally considered much more practical and easier to implement. Experimental studies based on randomly generated task sets are conducted to verify the theoretical results as well as the effectiveness of the proposed algorithms.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115428321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信