2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献_第7页

Solving Least-Squares Fitting in $O(1)$ Using RRAM-based Computing-in-Memory Technique 用基于随机存储器的内存计算技术求解$O(1)$的最小二乘拟合

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI: 10.1109/asp-dac52403.2022.9712568

Xiaoming Chen, Yinhe Han

引用次数: 0

HACScale: Hardware-Aware Compound Scaling for Resource-Efficient DNNs HACScale:资源高效dnn的硬件感知复合扩展

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI: 10.1109/ASP-DAC52403.2022.9712593

Hao Kong, Di Liu, Xiangzhong Luo, Weichen Liu, Ravi Subramaniam

{"title":"HACScale: Hardware-Aware Compound Scaling for Resource-Efficient DNNs","authors":"Hao Kong, Di Liu, Xiangzhong Luo, Weichen Liu, Ravi Subramaniam","doi":"10.1109/ASP-DAC52403.2022.9712593","DOIUrl":"https://doi.org/10.1109/ASP-DAC52403.2022.9712593","url":null,"abstract":"Model scaling is an effective way to improve the accuracy of deep neural networks (DNNs) by increasing the model capacity. However, existing approaches seldom consider the underlying hardware, causing inefficient utilization of hardware resources and consequently high inference latency. In this paper, we propose HACScale, a hardware-aware model scaling strategy to fully exploit hardware resources for higher accuracy. In HACScale, different dimensions of DNNs are jointly scaled with consideration of their contributions to hardware utilization and accuracy. To improve the efficiency of width scaling, we introduce importance-aware width scaling in HACScale, which computes the importance of each layer to the accuracy and scales each layer accordingly to optimize the trade-off between accuracy and model parameters. Experiments show that HACScale improves the hardware utilization by 1.92× on ImageNet, as a result, it achieves 2.41% accuracy improvement with a negligible latency increase of 0.6%. On CIFAR-10, HACScale improves the accuracy by 2.23% with only 6.5% latency growth.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133955665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Optimal Data Allocation for Graph Processing in Processing-in-Memory Systems 内存处理系统中图形处理的最佳数据分配

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI: 10.1109/asp-dac52403.2022.9712587

Zerun Li, Xiaoming Chen, Yinhe Han

{"title":"Optimal Data Allocation for Graph Processing in Processing-in-Memory Systems","authors":"Zerun Li, Xiaoming Chen, Yinhe Han","doi":"10.1109/asp-dac52403.2022.9712587","DOIUrl":"https://doi.org/10.1109/asp-dac52403.2022.9712587","url":null,"abstract":"Graph processing involves lots of irregular memory accesses and increases demands on high memory bandwidth, making it difficult to execute efficiently on compute-centric architectures. Dedicated graph processing accelerators based on the processing-in-memory (PIM) technique have recently been proposed. Despite they achieved higher performance and energy efficiency than conventional architectures, the data allocation problem for communication minimization in PIM systems (e.g., hybrid memory cubes (HMCs)) has still not been well solved. In this paper, we demonstrate that the conventional “graph data allocation = graph partitioning” assumption is not true, and the memory access patterns of graph algorithms should also be taken into account when partitioning graph data for communication minimization. For this purpose, we classify graph algorithms into two representative classes from a memory access pattern point of view and propose different graph data partitioning strategies for them. We then propose two algorithms to optimize the partition-to-HMC mapping to minimize the inter-HMC communication. Evaluations have proved the superiority of our data allocation framework and the data movement energy efficiency is improved by 4.2-5 × on average than the state-of-the-art GraphP approach.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115101995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A 40nm CMOS SoC for Real-Time Dysarthric Voice Conversion of Stroke Patients 一种用于脑卒中患者实时语音转换的40nm CMOS SoC

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI: 10.1109/ASP-DAC52403.2022.9712584

Tay-Jyi Lin, Chen-Zong Liao, You-Jia Hu, Wei-Cheng Hsu, Zheng-Xian Wu, Shao-Yu Wang, Chun-Ming Huang, Ying-Hui Lai, C. Yeh, Jinn-Shyan Wang

引用次数: 0

Pearl: Towards Optimization of DNN-accelerators Via Closed-Form Analytical Representation Pearl:通过封闭形式分析表示实现dnn加速器的优化

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI: 10.1109/ASP-DAC52403.2022.9712598

Arko Dutt, Suprojit Nandy, Mays Sabry

{"title":"Pearl: Towards Optimization of DNN-accelerators Via Closed-Form Analytical Representation","authors":"Arko Dutt, Suprojit Nandy, Mays Sabry","doi":"10.1109/ASP-DAC52403.2022.9712598","DOIUrl":"https://doi.org/10.1109/ASP-DAC52403.2022.9712598","url":null,"abstract":"Hardware accelerators for deep learning are proliferating, owing to their high-speed and energy-efficient execution of deep neural network (DNN) workloads. Ensuring an efficient DNN accelerator design requires a vast design-space exploration of a large number of parameters. However, current exploration frameworks are limited by slow architectural simulations, which limit the number of design points to be examined. To address this challenge, in this paper we introduce Pearl, an analytical representation of executing the DNN inference, mapped to an accelerator. Pearl provides immediate estimates of performance and energy of DNN accelerators, where we incorporate new parameters to capture dataflow mapping schemes beneficial for DNN systems. We model equations that represent utilization rates of the compute fabric for different dataflow mappings. We validate the accuracy of our equations against a state-of-the-art cycle-accurate DNN hardware simulator. Results show that Pearl achieves $< 1.0%$ and $< 1.3%$ average error in performance and energy estimates, respectively, while achieving $> 1.2cdot 10^{7}times$ simulation speedup. Pearl shows higher average accuracy than existing analytical models that support the simulator. We also leverage Pearl to explore and optimize area-constrained DNN accelerators targeting inference on full HD resolution.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116163664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

This is SPATEM! A Spatial-Temporal Optimization Framework for Efficient Inference on ReRAM-based CNN Accelerator 这是SPATEM!基于reram的CNN加速器高效推理时空优化框架

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI: 10.1109/ASP-DAC52403.2022.9712536

Yen-Ting Tsou, Kuan-Hsun Chen, Chia-Lin Yang, Hsiang-Yun Cheng, Jian-Jia Chen, Der-Yu Tsai

{"title":"This is SPATEM! A Spatial-Temporal Optimization Framework for Efficient Inference on ReRAM-based CNN Accelerator","authors":"Yen-Ting Tsou, Kuan-Hsun Chen, Chia-Lin Yang, Hsiang-Yun Cheng, Jian-Jia Chen, Der-Yu Tsai","doi":"10.1109/ASP-DAC52403.2022.9712536","DOIUrl":"https://doi.org/10.1109/ASP-DAC52403.2022.9712536","url":null,"abstract":"Resistive memory-based computing-in-memory (CIM) has been considered as a promising solution to accelerate convolutional neural networks (CNN) inference, which stores the weights in crossbar memory arrays and performs in-situ matrix-vector multiplications (MVMs) in an analog manner. Several techniques assume that a whole crossbar can operate concurrently and discuss how to efficiently map the weights onto crossbar arrays. However, in practice, the accumulated effect of per-cell current deviation and Analog-to-Digital-Converter overhead may greatly degrade inference accuracy, which motivates the concept of Operation Unit (OU), by which an operation per cycle in a crossbar only involve limited wordlines and bitlines to preserve satisfactory inference accuracy. With OU-based operations, the mapping of weights and scheduling strategy for parallelizing CNN convolution operations should take the cost of communication overhead and resource utilization into consideration to optimize the inference acceleration. In this work, we propose the first optimization framework named SPATEM, that efficiently executes MVMs with OU-based operations on ReRAM-based CIM accelerators. It decouples the design space into tractable steps, models the expected inference latency, and derives an optimized spatial-temporal-aware scheduling strategy. By comparing with state-of-the-arts, the experimental result shows that the derived scheduling strategy of SPATEM achieves on average 29.24% inference latency reduction with 31.28% less communication overhead by exploiting more originally unused crossbar cells.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116603534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Thermal-Aware Layout Optimization and Mapping Methods for Resistive Neuromorphic Engines 电阻式神经形态发动机热感知布局优化与映射方法

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI: 10.1109/asp-dac52403.2022.9712596

Chengrui Zhang, Yu Ma, Pingqiang Zhou

引用次数: 2

SPRoute 2.0: A detailed-routability-driven deterministic parallel global router with soft capacity SPRoute 2.0:一个具有软容量的详细可达性驱动的确定性并行全局路由器

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI: 10.1109/ASP-DAC52403.2022.9712557

Jiayuan He, U. Agarwal, Yihang Yang, R. Manohar, K. Pingali

{"title":"SPRoute 2.0: A detailed-routability-driven deterministic parallel global router with soft capacity","authors":"Jiayuan He, U. Agarwal, Yihang Yang, R. Manohar, K. Pingali","doi":"10.1109/ASP-DAC52403.2022.9712557","DOIUrl":"https://doi.org/10.1109/ASP-DAC52403.2022.9712557","url":null,"abstract":"Global routing has become more challenging due to advancements in the technology node and the ever-increasing size of chips. Global routing needs to generate routing guides such that (1) routability of detailed routing is considered and (2) the routing is deterministic and fast. In this paper, we firstly introduce soft capacity which reserves routing space for detailed routing based on the pin density and Rectangular Uniform wire Density (RUDY). Second, we propose a deterministic parallelization approach that partitions the netlist into batches and then bulk-synchronously maze-routes a single batch of nets. The advantage of this approach is that it guarantees determinacy without requiring the nets running in parallel to be disjoint, thus guaranteeing scalability. We then design a scheduler that mitigates the load imbalance and livelock issues in this bulk synchronous execution model. We implement SPRoute 2.0 with the proposed methodology. The experimental results show that SPRoute 2.0 generates good quality of results with 43% fewer shorts, 14% fewer DRCs and a 7.4X speedup over a state-of-the-art global router on the ICCAD2019 contest benchmarks.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124713471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Accelerate SAT-based ATPG via Preprocessing and New Conflict Management Heuristics 利用预处理和新的冲突管理启发式方法加速基于sat的ATPG

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI: 10.1109/ASP-DAC52403.2022.9712573

Jun Huang, Hui-Ling Zhen, Naixing Wang, Mingxuan Yuan, Hui Mao, Yu Huang, Jiping Tao

{"title":"Accelerate SAT-based ATPG via Preprocessing and New Conflict Management Heuristics","authors":"Jun Huang, Hui-Ling Zhen, Naixing Wang, Mingxuan Yuan, Hui Mao, Yu Huang, Jiping Tao","doi":"10.1109/ASP-DAC52403.2022.9712573","DOIUrl":"https://doi.org/10.1109/ASP-DAC52403.2022.9712573","url":null,"abstract":"Due to the continuous advancement of semicon-ductor technologies, there are more defects than ever widely distributed in manufactured chips. In order to meet the high product quality and low defective-parts-per-million (DPPM) goals, Boolean Satisfiability (SAT) technique has been shown to be a robust alternative to conventional APTG techniques, especially for hard-to-detect faults. However, the SAT-based ATPG still confronts two challenges. The first one is to reduce extra computational overhead of SAT modeling, i.e. to transform a circuit testing problem to a Conjunctive Normal Form (CNF) which is the foundation of modern SAT solvers. The second one lies in the SAT solver's efficiency which is brought by the loss of structural information during CNF transformation. In this work, we propose a new SAT-based ATPG approach to address the two challenges mentioned above: (1) To reduce CNF transformation overhead, we utilize a simulation-driven pre-processing for narrowing down the fault propagation and activation logic cones, leading to an improvement in CNF transformation and reduction in runtime. (2) To further improve the solving efficiency, We propose new ranking-based heuristics to build more effective conflict database, enabling the direct solving for small scale instance and a looking-head method for large scale ones. Extensive experimental results on industrial circuits demonstrate that on average the proposed approach could cover 89.67% of the faults failed by a commercial ATPG tool with a comparable runtime.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121826888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Mapping Large Scale Finite Element Computing on to Wafer-Scale Engines 将大规模有限元计算映射到晶圆级发动机上

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI: 10.1109/asp-dac52403.2022.9712538

Yishuang Lin, Rongjian Liang, Yaguang Li, Hailiang Hu, Jiang Hu

引用次数: 0