2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献

筛选
英文 中文
Exploiting HLS-Generated Multi-Version Kernels to Improve CPU-FPGA Cloud Systems 利用hls生成的多版本内核改进CPU-FPGA云系统
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431557
Bernardo Neuhaus Lignati, M. Jordan, Guilherme Korol, M. B. Rutzig, A. C. S. Beck
{"title":"Exploiting HLS-Generated Multi-Version Kernels to Improve CPU-FPGA Cloud Systems","authors":"Bernardo Neuhaus Lignati, M. Jordan, Guilherme Korol, M. B. Rutzig, A. C. S. Beck","doi":"10.1145/3394885.3431557","DOIUrl":"https://doi.org/10.1145/3394885.3431557","url":null,"abstract":"Cloud Warehouses have been exploiting CPU-FPGA collaborative execution environments, where multiple clients share the same infrastructure to achieve to maximize resource utilization with the highest possible energy efficiency and scalability. However, the resource provisioning is challenging in these environments, since kernels may be dispatched to both CPU and FPGA concurrently in a highly variant scenario, in terms of available resources and workload characteristics. In this work, we propose MultiVers, a framework that leverages automatic HLS generation to enable further gains in such CPU-FPGA collaborative systems. MultiVers exploits the automatic generation from HLS to build libraries containing multiple versions of each incoming kernel request, greatly enlarging the available design space exploration passive of optimization by the allocation strategies in the cloud provider. Multivers makes both kernel multiversioning and allocation strategy to work symbiotically, allowing fine-tuning in terms of resource usage, performance, energy, or any combination of these parameters. We show the efficiency of MultiVers by using real-world cloud request scenarios with a diversity of benchmarks, achieving average improvements on makespan and energy of up to 4.62× and 19.04×, respectively, over traditional allocation strategies executing non-optimized kernels.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"297 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121206593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Puncturing the memory wall: Joint optimization of network compression with approximate memory for ASR application 突破内存墙:面向ASR应用的近似内存网络压缩联合优化
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431512
Qin Li, Peiyan Dong, Zijie Yu, Changlu Liu, F. Qiao, Yanzhi Wang, Huazhong Yang
{"title":"Puncturing the memory wall: Joint optimization of network compression with approximate memory for ASR application","authors":"Qin Li, Peiyan Dong, Zijie Yu, Changlu Liu, F. Qiao, Yanzhi Wang, Huazhong Yang","doi":"10.1145/3394885.3431512","DOIUrl":"https://doi.org/10.1145/3394885.3431512","url":null,"abstract":"The automatic speech recognition (ASR) system is becoming increasingly irreplaceable in smart speech interaction applications. Nonetheless, these applications confront the memory wall when embedded in the energy and memory constrained Internet of Things devices. Therefore, it is extremely challenging but imperative to design a memory-saving and energy-saving ASR system. This paper proposes a joint-optimized scheme of network compression with approximate memory for the economical ASR system. At the algorithm level, this work presents block-based pruning and quantization with error model (BPQE), an optimized compression framework including a novel pruning technique coordinated with low-precision quantization and the approximate memory scheme. The BPQE compressed recurrent neural network (RNN) model comes with an ultra-high compression rate and fine-grained structured pattern that reduce the amount of memory access immensely. At the hardware level, this work presents an ASR-adapted incremental retraining method to further obtain optimal power saving. This retraining method stimulates the utility of the approximate memory scheme, while maintaining considerable accuracy. According to the experiment results, the proposed joint-optimized scheme achieves 58.6% power saving and 40× memory saving with a phone error rate of 20%.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123763062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Boosting Pin Accessibility Through Cell Layout Topology Diversification 通过单元布局拓扑多样化提高引脚可达性
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431567
Suwan Kim, Kyeongrok Jo, Taewhan Kim
{"title":"Boosting Pin Accessibility Through Cell Layout Topology Diversification","authors":"Suwan Kim, Kyeongrok Jo, Taewhan Kim","doi":"10.1145/3394885.3431567","DOIUrl":"https://doi.org/10.1145/3394885.3431567","url":null,"abstract":"As the layout of standard cells is becoming dense, accessing pins is much harder in detailed routing. The conventional solutions to resolving the pin access issue are to attempt cell flipping, cell shifting, cell swapping, and/or cell dilating in the placement optimization stage, expecting to acquire high pin accessibility. However, those solutions do not guarantee close-to-100% pin accessibility to ensure safe manual fixing afterward in the routing stage. Furthermore, there is no easy and effective methodology to fix the inaccessibility in the detailed routing stage as yet. This work addresses the problem of fixing the inaccessibility in the detailed routing stage. Precisely, (1) we produce, for each type of cell, multiple layouts with diverse pin locations and access points by modifying the core engines i.e., gate poly ordering and middle-of-line dummy insertion in the flow of design-technology co-optimization based automatic cell layout generation. Then, (2) we propose a systematic method to make use of those layouts to fix the routing failures caused by pin inaccessibility in the ECO (Engineering Change Order) routing stage. Experimental results demonstrate that our proposed cell layout diversification and replacement approach can fix metal-2 shorts by 93.22% in the ECO routing stage.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131273223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Sub-10-µm Coil Design for Multi-Hop Inductive Coupling Interface 多跳电感耦合接口的10µm以下线圈设计
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431649
Tatsuo Omori, K. Shiba, M. Hamada, T. Kuroda
{"title":"Sub-10-µm Coil Design for Multi-Hop Inductive Coupling Interface","authors":"Tatsuo Omori, K. Shiba, M. Hamada, T. Kuroda","doi":"10.1145/3394885.3431649","DOIUrl":"https://doi.org/10.1145/3394885.3431649","url":null,"abstract":"Sub-10-µm on-chip coils are designed and prototyped for the multihop inductive coupling interface in a 40-nm CMOS. Multi-layer coils and a new receiver circuit are employed to compensate the decrease of the coupling coefficient due to the small coil size. The prototype emulates a 3D stacked module with 8 dies in a 7-nm CMOS and shows that a 0.1-pJ/bit and 41-Tb/s/mm2 inductive coupling interface is achievable.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131565674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A DSM-based Polar Transmitter with 23.8% System Efficiency 基于dsm的极态发射机,系统效率为23.8%
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431653
Yuncheng Zhang, Bangan Liu, Xiaofan Gu, Chun Wang, A. Shirane, K. Okada
{"title":"A DSM-based Polar Transmitter with 23.8% System Efficiency","authors":"Yuncheng Zhang, Bangan Liu, Xiaofan Gu, Chun Wang, A. Shirane, K. Okada","doi":"10.1145/3394885.3431653","DOIUrl":"https://doi.org/10.1145/3394885.3431653","url":null,"abstract":"An energy efficient digital polar transmitter (TX) based on 1.5bit Delta-Sigma modulator (DSM) and fractional-N injection-locked phase-locked loop (IL-PLL) is proposed. In the proposed TX, redundant charge and discharge of turned-off capacitors in the conventional switched-capacitor power amplifiers (SCPAs) are avoided, which drastically improves the efficiency at power back-off. In the PLL, spur-mitigation technique is proposed to reduce the frequency mismatch between the oscillator and the reference. The transmitter, implemented in 65nm CMOS, achieves a PAE of 29% at an EVM of -25.1dB, and a system efficiency of 23.8%.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114871561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Area Efficient Functional Locking through Coarse Grained Runtime Reconfigurable Architectures 通过粗粒度运行时可重构架构实现区域高效功能锁定
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431601
Jianqi Chen, Benjamin Carrión Schäfer
{"title":"Area Efficient Functional Locking through Coarse Grained Runtime Reconfigurable Architectures","authors":"Jianqi Chen, Benjamin Carrión Schäfer","doi":"10.1145/3394885.3431601","DOIUrl":"https://doi.org/10.1145/3394885.3431601","url":null,"abstract":"The protection of Intellectual Property (IP) has emerged as one of the most important issues in the hardware design industry. Most VLSI design companies are now fabless and need to protect their IP from being illegally distributed. One of the main approach to address this has been through logic locking. Logic locking prevents IPs from being reversed engineered as well as overbuilding the hardware circuit by untrusted foundries. One of the main problem with existing logic locking techniques is that the foundry has full access to the entire design including the logic locking mechanism. Because of the importance of this topic, continuous more robust locking mechanisms are proposed and equally fast new methods to break them appear. One alternative approach is to lock a circuit through omission. The main idea is to selectively map a portion of the IP onto an embedded FPGA (eFPGA). Because the foundry does not have access to the bitstream, the circuit cannot be used until programmed by the legitimate user. One of the main problems with this approach is the large overhead in terms of area and power, as well as timing degradation. Area is especially a concern for price sensitive applications. To address this, in this work we presents a method to map portions of a design onto a Coarse Grained Runtime Reconfigurable Architecture (CGRRA) such that multiple parts of a design can be hidden onto the CGRRA, substantially amortizing the area overhead introduced by the CGRRA.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115597191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fast and Efficient Constraint Evaluation of Analog Layout Using Machine Learning Models 基于机器学习模型的模拟布局快速有效约束评估
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431547
Tonmoy Dhar, Jitesh Poojary, Yaguang Li, K. Kunal, Meghna Madhusudan, A. Sharma, Susmita Dey Manasi, Jiang Hu, R. Harjani, S. Sapatnekar
{"title":"Fast and Efficient Constraint Evaluation of Analog Layout Using Machine Learning Models","authors":"Tonmoy Dhar, Jitesh Poojary, Yaguang Li, K. Kunal, Meghna Madhusudan, A. Sharma, Susmita Dey Manasi, Jiang Hu, R. Harjani, S. Sapatnekar","doi":"10.1145/3394885.3431547","DOIUrl":"https://doi.org/10.1145/3394885.3431547","url":null,"abstract":"Placement algorithms for analog circuits explore numerous layout configurations in their iterative search. To steer these engines towards layouts that meet the electrical constraints on the design, this work develops a fast feasibility predictor to guide the layout engine. The flow first discerns rough bounds on layout parasitics and prunes the feature space. Next, a Latin hypercube sampling technique is used to sample the reduced search space, and the labeled samples are classified by a linear support vector machine (SVM). If necessary, a denser sample set is used for the SVM, or if the constraints are found to be nonlinear, a multilayer perceptron (MLP) is employed. The resulting machine learning model demonstrated to rapidly evaluate candidate placements in a placer, and is used to build layouts for several analog blocks.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123440536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Multi-FPGA Co-optimization: Hybrid Routing and Competitive-based Time Division Multiplexing Assignment 多fpga协同优化:混合路由和基于竞争的时分复用分配
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431565
Dan Zheng, Xiaopeng Zhang, Chak-Wa Pui, Evangeline F. Y. Young
{"title":"Multi-FPGA Co-optimization: Hybrid Routing and Competitive-based Time Division Multiplexing Assignment","authors":"Dan Zheng, Xiaopeng Zhang, Chak-Wa Pui, Evangeline F. Y. Young","doi":"10.1145/3394885.3431565","DOIUrl":"https://doi.org/10.1145/3394885.3431565","url":null,"abstract":"In multi-FPGA systems, time-division multiplexing (TDM) is a widely used technique to transfer signals between FPGAs. While TDM can greatly increase logic utilization, the inter-FPGA delay will also become longer. A good time-multiplexing scheme for inter-FPGA signals is very important for optimizing the system performance. In this work, we propose a fast algorithm to generate high quality time-multiplexed routing results for multiple FPGA systems. A hybrid routing algorithm is proposed to route the nets between FPGAs, by maze routing and by a fast minimum terminal spanning tree method. After obtaining a routing topology, a two-step method is applied to perform TDM assignment to optimize timing, which includes an initial assignment and a competitive-based refinement. Experiments show that our system-level routing and TDM assignment algorithm can outperform both the top winner of the ICCAD 2019 Contest and the state-of-the-art methods. Moreover, compared to the state-of-the-art works [17], [22], our approach has better run time by more than 2X with better or comparable TDM performance.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129641124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient Computing Platform Design for Autonomous Driving Systems 自动驾驶系统高效计算平台设计
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431620
Shuang Liang, Changcheng Tang, Xuefei Ning, Shulin Zeng, Jincheng Yu, Yu Wang, Kaiyuan Guo, Diange Yang, Tianyi Lu, Huazhong Yang
{"title":"Efficient Computing Platform Design for Autonomous Driving Systems","authors":"Shuang Liang, Changcheng Tang, Xuefei Ning, Shulin Zeng, Jincheng Yu, Yu Wang, Kaiyuan Guo, Diange Yang, Tianyi Lu, Huazhong Yang","doi":"10.1145/3394885.3431620","DOIUrl":"https://doi.org/10.1145/3394885.3431620","url":null,"abstract":"Autonomous driving is becoming a hot topic in both academic and industrial communities. Traditional algorithms can hardly achieve the complex tasks and meet the high safety criteria. Recent research on deep learning shows significant performance improvement over traditional algorithms and is believed to be a strong candidate in autonomous driving system. Despite the attractive performance, deep learning does not solve the problem totally. The application scenario requires that an autonomous driving system must work in real-time to keep safety. But the high computation complexity of neural network model, together with complicated pre-process and post-process, brings great challenges. System designers need to do dedicated optimizations to make a practical computing platform for autonomous driving. In this paper, we introduce our work on efficient computing platform design for autonomous driving systems. In the software level, we introduce neural network compression and hardware-aware architecture search to reduce the workload. In the hardware level, we propose customized hardware accelerators for pre- and post-process of deep learning algorithms. Finally, we introduce the hardware platform design, NOVA-30, and our on-vehicle evaluation project.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124811607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
High-Level Synthesis of Transactional Memory 事务性内存的高级综合
2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2021-01-18 DOI: 10.1145/3394885.3431556
Omar Ragheb, J. Anderson
{"title":"High-Level Synthesis of Transactional Memory","authors":"Omar Ragheb, J. Anderson","doi":"10.1145/3394885.3431556","DOIUrl":"https://doi.org/10.1145/3394885.3431556","url":null,"abstract":"The rising popularity of high-level synthesis (HLS) is due to the complexity and amount of background knowledge required to design hardware circuits. Despite significant recent advances in HLS research, HLS-generated circuits may be of lower quality than human-expert-designed circuits, from the performance, power, or area perspectives. In this work, we aim to raise circuit performance by introducing a transactional memory (TM) synchronization model to the open-source LegUp HLS tool [1]. LegUp HLS supports the synthesis of multi-threaded software into parallel hardware [4], including support for mutual-exclusion lock-based synchronization. With the introduction of transactional memory-based synchronization, location-specific (i.e. finer grained) memory locks are made possible, where instead of placing an access lock around an entire array, one can place a lock around individual array elements. Significant circuit performance improvements are observed through reduced stalls due to contention, and greater memory-access parallelism. On a set of 5 parallel benchmarks, wall-clock time is improved by 2.0×, on average, by the TM synchronization model vs. mutex-based locks.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116363676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信