Proceedings of the 39th International Conference on Computer-Aided Design最新文献_第3页

AxHLS

Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415732

Jorge Castro-Godínez, Julián Mateus-Vargas, M. Shafique, Jörg Henkel

{"title":"AxHLS","authors":"Jorge Castro-Godínez, Julián Mateus-Vargas, M. Shafique, Jörg Henkel","doi":"10.1145/3400302.3415732","DOIUrl":"https://doi.org/10.1145/3400302.3415732","url":null,"abstract":"With the emergence of approximate computing as a design paradigm, many approximate functional units have been proposed, particularly approximate adders and multipliers. These circuits compromise the accuracy of their results within a tolerable limit to reduce the required computational effort and energy requirements. However, for an ongoing number of such approximate circuits reported in the literature, selecting those that minimize the required resources for designing and generating an approximate accelerator from a high-level specification, while satisfying a defined accuracy constraint, is a joint high-level synthesis (HLS) and design space exploration (DSE) challenge. In this paper, we propose a novel automated framework for HLS of approximate accelerators using a given library of approximate functional units. Since repetitive circuit synthesis and gate-level simulations require a significant amount of time, to enable our framework, we present AxME, a set of analytical models for estimating the required computational resources when using approximate adders and multipliers in approximate designs. We propose DSEwam, a DSE methodology for error-tolerant applications, in which analytical models, such as AxME, are used to estimate resources needed and the accuracy of approximate designs. Furthermore, we integrate DSEwam into an HLS tool to automatically generate Pareto-optimal, or near Pareto-optimal, approximate accelerators from C language descriptions, for a given error threshold and minimization goal. We release our DSE framework as an open-source contribution, which will significantly boost the research and development in the field of automatic generation of approximate accelerators.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126557070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

fuseGNN

Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415610

Zhaodong Chen, Mingyu Yan, Maohua Zhu, Lei Deng, Guoqi Li, Shuangchen Li, Yuan Xie

{"title":"fuseGNN","authors":"Zhaodong Chen, Mingyu Yan, Maohua Zhu, Lei Deng, Guoqi Li, Shuangchen Li, Yuan Xie","doi":"10.1145/3400302.3415610","DOIUrl":"https://doi.org/10.1145/3400302.3415610","url":null,"abstract":"Graph convolutional neural networks (GNN) have achieved state-of-the-art performance on tasks like node classification. It has become a new workload family member in data-centers. GNN works on irregular graph-structured data with three distinct phases: Combination, Graph Processing, and Aggregation. While Combination phase has been well supported by sgemm kernels in cuBLAS, the other two phases are still inefficient on GPGPU due to the lack of optimized CUDA kernels. In particular, Aggregation phase introduces large volume of DRAM storage footprint and data movement, and both Aggregation and Graph Processing phases suffer from high kernel launching time. These inefficiencies not only decrease training throughput but also limit users from training GNNs on larger graphs on GPGPU. Although these problems have been partially alleviated by recent studies, their optimizations are still not sufficient. In this paper, we propose fuseGNN, an extension of PyTorch that provides highly optimized APIs and CUDA kernels for GNN. First, two different programming abstractions for Aggregation phase are utilized to handle graphs with different average degrees. Second, dedicated GPGPU kernels are developed for Aggregation and Graph Processing in both forward and backward passes, in which kernel-fusion along with other optimization strategies are applied to reduce kernel launching time and latency as well as exploit data reuse opportunities. Evaluation on multiple benchmarks shows that fuseGNN achieves up to 5.3× end-to-end speedup over state-of-the-art frameworks, and the DRAM storage footprint is reduced by several orders of magnitude on large datasets.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122693794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

PathDriver PathDriver

Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415725

Xing Huang, Youlin Pan, Grace Li Zhang, Bing Li, Wenzhong Guo, Tsung-Yi Ho, Ulf Schlichtmann

{"title":"PathDriver","authors":"Xing Huang, Youlin Pan, Grace Li Zhang, Bing Li, Wenzhong Guo, Tsung-Yi Ho, Ulf Schlichtmann","doi":"10.1145/3400302.3415725","DOIUrl":"https://doi.org/10.1145/3400302.3415725","url":null,"abstract":"Continuous-flow microfluidic biochips have attracted high research interest over the past years. Inside such a chip, fluid samples of milliliter volumes are efficiently transported between devices (e.g., mixers, etc.) to automatically perform various laboratory procedures in biology and biochemistry. Each transportation task, however, requires an exclusive flow path composed of multiple contiguous microchannels during its execution period. Excess/waste fluids, in the meantime, should be discarded by independent flow paths connected to waste ports. All these paths are etched in a very tiny chip area using multilayer soft lithography and driven by flow ports connecting with external pressure sources, forming a highly integrated chip architecture that dominates the performance of biochips. In this paper, we propose a practical synthesis flow called PathDriver for the design automation of microfluidic biochips, integrating the actual fluid manipulations into both high-level synthesis and physical design, which has never been considered in prior work. Given the protocols of biochemical applications, PathDriver aims to generate highly efficient chip architectures with a flow-path network that enables the manipulation of actual fluid transportation and removal. Additionally, fluid volume management between devices and flow-path minimization are realized for the first time, thus ensuring the correctness of assay outcomes while reducing the complexity of chip architectures. Experimental results on multiple benchmarks demonstrate the effectiveness of the proposed synthesis flow.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116675314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

SWIPE 刷卡

Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415642

Sujan Kumar Gonugondla, Ameya D. Patil, Naresh R Shanbhag

引用次数: 6

ReTransformer ReTransformer

Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415640

Xiaoxuan Yang, Bonan Yan, Hai Li, Yiran Chen

{"title":"ReTransformer","authors":"Xiaoxuan Yang, Bonan Yan, Hai Li, Yiran Chen","doi":"10.1145/3400302.3415640","DOIUrl":"https://doi.org/10.1145/3400302.3415640","url":null,"abstract":"Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"2673 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114666314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

COALA

Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415721

Yun-Jhe Jiang, Shao-Yun Fang

{"title":"COALA","authors":"Yun-Jhe Jiang, Shao-Yun Fang","doi":"10.1145/3400302.3415721","DOIUrl":"https://doi.org/10.1145/3400302.3415721","url":null,"abstract":"Two-dimensional (2D) global routing followed by layer assignment is a common and popular strategy to obtain a good trade-off between runtime and routing performance. Yet, the huge gap between 2D routing patterns and the final 3D routing paths often results in inevitable overflow after layer assignment. State-of-the-art studies on layer assignment usually adopt dynamic programming-based approaches to sequentially find an optimal solution for each net in terms of overflow or/and the number of vias. However, a fixed assignment ordering severely restricts the solution space, and the distributed overflows can hardly be resolved with any existing refinement approach. This paper proposes a novel layer assignment framework that concurrently considers all the wire segments of nets and iteratively assigns them from the lowest available layer to the highest one. The concurrent scheme facilitates the maximal utilization of routing resource on each layer, contributing to an effective re-routing procedure that greatly reduces inevitable overflows. Experimental results show that compared to the sequential layer assignment solutions that also refined by the same re-routing procedure, the proposed framework can averagely reduce the maximum overflow in a tile by 32% and reduce the number of tiles with overflows by 28% with much less runtime, which shows the significant advantage of concurrent layer assignment over sequential methods.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128092321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Problem C: GPU accelerated logic re-simulation 问题C: GPU加速逻辑重新模拟

Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415740

Yanqing Zhang, Haoxing Ren, Ben Keller, Brucek Khailany

引用次数: 5

i TPlace

Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415613

Tai-Cheng Lee, Chenghan. Yang, Yih-Lang Li

引用次数: 3

ASAP 烟

Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415626

Yi-Chen Chang, Hongjia Li, Olivia Chenht, Yanzhi Wang, N. Yoshikawa, Tsung-Yi Ho

{"title":"ASAP","authors":"Yi-Chen Chang, Hongjia Li, Olivia Chenht, Yanzhi Wang, N. Yoshikawa, Tsung-Yi Ho","doi":"10.1145/3400302.3415626","DOIUrl":"https://doi.org/10.1145/3400302.3415626","url":null,"abstract":"Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with very low energy dissipation. Each AQFP cell is driven by AC-power to serve as both power supply and clock signal. The clock signals trigger the data flow from one clock phase to the next clock phase, and the delay for each output in the same phase has to be equal. At the same time, the signal current attenuates as the wire becomes longer. When a wire exceeds a maximum length, the weak current causes incorrect data. Thus, rows of buffers have to be inserted as repeaters to satisfy both delay synchronization and wirelength constraint. These inserted buffers significantly increase the power consumption and also the total delay of AQFP circuits. In this paper, we propose an analytical strategy for AQFP placement (ASAP) to provide effective placement results that greatly reduce the number of additional inserted buffers. ASAP includes two main characteristics: 1) a new wire-length function for analytical global placement and 2) detailed placement including fixed-order Lagrangian relaxation and cell balancing algorithm. Experimental results show the efficiency of ASAP framework and a 53% reduction of buffers over the state-of-the-art method.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124153460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

F2VD

Proceedings of the 39th International Conference on Computer-Aided Design Pub Date : 2020-11-02 DOI: 10.1145/3400302.3415716

Kecheng Yang, Ashikahmed Bhuiyan, Zhishan Guo

{"title":"F2VD","authors":"Kecheng Yang, Ashikahmed Bhuiyan, Zhishan Guo","doi":"10.1145/3400302.3415716","DOIUrl":"https://doi.org/10.1145/3400302.3415716","url":null,"abstract":"Increasingly complex and integrated systems design has led to more timing uncertainty, which may result in pessimism in time-sensitive system design and analysis. To mitigate such pessimism, mixed-criticality (MC) design for real-time systems has been proposed, where highly critical tasks, often with extremely pessimistic execution time estimates, can share the processor with less critical ones in a manner that the latter is sacrificed, completely or partially, to guarantee temporal correctness to the former, when the extremely pessimistic scenario does happen. In contrast to such sacrifice of tasks, the precise MC scheduling model has recently been investigated, where all tasks, including less critical ones, must fully complete their execution in all circumstances. Meanwhile, the processor may operate at a degraded speed when the tasks' runtime behaviors are far from the extreme pessimistic estimates and would recover to the full processing speed once the extremely pessimistic scenario does happen. This paper presents a generalized fluid-scheduling-based solution to this problem, where feasible fluid-scheduling rates for each task are derived from an optimization problem. Furthermore, this paper proposes a novel algorithm F2VD for setting virtual deadlines from any feasible fluid rates, such that any fluid-scheduling-based solution can be converted to a deadline-based scheduling approach with no schedulability loss, where the latter is generally considered much more practical and easier to implement. Experimental studies based on randomly generated task sets are conducted to verify the theoretical results as well as the effectiveness of the proposed algorithms.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115428321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17