2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献_第10页

HL-Pow: A Learning-Based Power Modeling Framework for High-Level Synthesis HL-Pow:基于学习的高级综合能力建模框架

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045442

Zhe Lin, Jieru Zhao, Sharad Sinha, Wei Zhang

{"title":"HL-Pow: A Learning-Based Power Modeling Framework for High-Level Synthesis","authors":"Zhe Lin, Jieru Zhao, Sharad Sinha, Wei Zhang","doi":"10.1109/ASP-DAC47756.2020.9045442","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045442","url":null,"abstract":"High-level synthesis (HLS) enables designers to customize hardware designs efficiently. However, it is still challenging to foresee the correlation between power consumption and HLS-based applications at an early design stage. To overcome this problem, we introduce HL-Pow, a power modeling framework for FPGA HLS based on state-of-the-art machine learning techniques. HL-Pow incorporates an automated feature construction flow to efficiently identify and extract features that exert a major influence on power consumption, simply based upon HLS results, and a modeling flow that can build an accurate and generic power model applicable to a variety of designs with HLS. By using HL-Pow, the power evaluation process for FPGA designs can be significantly expedited because the power inference of HL-Pow is established on HLS instead of the time-consuming register-transfer level (RTL) implementation flow. Experimental results demonstrate that HL-Pow can achieve accurate power modeling that is only 4.67% (24.02 mW) away from onboard power measurement. To further facilitate power-oriented optimizations, we describe a novel design space exploration (DSE) algorithm built on top of HL-Pow to trade off between latency and power consumption. This algorithm can reach a close approximation of the real Pareto frontier while only requiring running HLS flow for 20% of design points in the entire design space.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123480309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

EFFORT: Enhancing Energy Efficiency and Error Resilience of a Near-Threshold Tensor Processing Unit 努力:提高近阈值张量处理单元的能量效率和错误恢复能力

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045479

N. D. Gundi, Tahmoures Shabanian, Prabal Basu, Pramesh Pandey, Sanghamitra Roy, Koushik Chakraborty, Zhen Zhang

{"title":"EFFORT: Enhancing Energy Efficiency and Error Resilience of a Near-Threshold Tensor Processing Unit","authors":"N. D. Gundi, Tahmoures Shabanian, Prabal Basu, Pramesh Pandey, Sanghamitra Roy, Koushik Chakraborty, Zhen Zhang","doi":"10.1109/ASP-DAC47756.2020.9045479","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045479","url":null,"abstract":"Modern deep neural network (DNN) applications demand a remarkable processing throughput usually unmet by traditional Von Neumann architectures. Consequently, hardware accelerators, comprising a sea of multiplier and accumulate (MAC) units, have recently gained prominence in accelerating DNN inference engine. For example, Tensor Processing Units (TPU) account for a lion’s share of Google’s datacenter inference operations. The proliferation of real-time DNN predictions is accompanied with a tremendous energy budget. In quest of trimming the energy footprint of DNN accelerators, we propose EFFORT—an energy optimized, yet high performance TPU architecture, operating at the Near-Threshold Computing (NTC) region. EFFORT promotes a better-than-worst-case design by operating the NTC TPU at a substantially high frequency while keeping the voltage at the NTC nominal value. In order to tackle the timing errors due to such aggressive operation, we employ an opportunistic error mitigation strategy. Additionally, we implement an in-situ clock gating architecture, drastically reducing the MACs’ dynamic power consumption. Compared to a cutting-edge error mitigation technique for TPUs, EFFORT enables up to 2.5× better performance at NTC with only 2% average accuracy drop across 3 out of 4 DNN datasets.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132662363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Contention Minimized Bypassing in SMART NoC SMART NoC中的竞争最小化绕过

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045103

Peng Chen, Weichen Liu, Mengquan Li, Lei Yang, Nan Guan

{"title":"Contention Minimized Bypassing in SMART NoC","authors":"Peng Chen, Weichen Liu, Mengquan Li, Lei Yang, Nan Guan","doi":"10.1109/ASP-DAC47756.2020.9045103","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045103","url":null,"abstract":"SMART, a recently proposed dynamically reconfigurable NoC, enables single-cycle long-distance communication by building single-bypass paths. However, such a single-cycle single-bypass path will be broken when contention occurs. Thus, lower-priority packets will be buffered at intermediate routers with blocking latency from higher-priority packets, and extra router-stage latency to rebuild remaining path, reducing the bypassing benefits that SMART offers. In this paper, we for the first time propose an effective routing strategy to achieve nearly contention-free bypassing in SMART NoC. Specifically, we identify two different routes for communication pairs: direct route, with which data can reach the destination in a single bypass; and indirect route, with which data can reach the destination in two bypasses via an intermediate router. If a direct route is not found, we would alternatively resort to an indirect route in advance to eliminate the blocking latency, at the cost of only one router-stage latency. Compared with the current routing, our new approach can effectively isolate conflicting communication pairs, greatly balance the traffic loads and fully utilize bypass paths. Experiments show that our approach makes 22.6% performance improvement on average in terms of communication latency.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134599021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Towards Automatic Hardware Synthesis from Formal Specification to Implementation 从形式规范到实现走向自动化硬件综合

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045406

Fritjof Bornebusch, Christoph Lüth, R. Wille, R. Drechsler

引用次数: 3

Concurrent Monitoring of Operational Health in Neural Networks Through Balanced Output Partitions 基于平衡输出分区的神经网络运行健康并发监测

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045662

Elbruz Ozen, A. Orailoglu

引用次数: 13

Optimization of Fluid Loading on Programmable Microfluidic Devices for Bio-protocol Execution 可编程微流控装置的流体加载优化

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045675

Satoru Maruyama, Debraj Kundu, S. Yamashita, Sudip Roy

{"title":"Optimization of Fluid Loading on Programmable Microfluidic Devices for Bio-protocol Execution","authors":"Satoru Maruyama, Debraj Kundu, S. Yamashita, Sudip Roy","doi":"10.1109/ASP-DAC47756.2020.9045675","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045675","url":null,"abstract":"Recently, Programmable Microfluidic Device (PMD) has got an attention of the design automation communities as a new type of microfluidic biochips. For the design of PMD chips, one of the important tasks is to minimize the number of flows for loading the reactant fluids into specific cells (by creating some flows of the fluids) before the bio-protocol is executed. Nevertheless of the importance of the problem, there has been almost no work to study this problem. Thus, in this paper, we intensively study this fluid loading problem in PMD chips. First, we successfully formulate the problem as a constraint satisfaction problem (CSP) to solve the problem optimally for the first time. Then, we also propose an efficient heuristic called Determining Flows from the Last (DFL) method for larger problem instances. DFL is based on a novel idea that it is better to determine the flows from the last flow unlike the state-of-the-art method Fluid Loading Algorithm for PMD (FLAP) [Gupta et al., TODAES, 2019]. Simulation results confirm that the exact method can find the optimal solutions for practical test cases, whereas our heuristic can find near-optimal solutions, which are better than those obtained by FLAP.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"59 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127609836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Parallelism in Deep Learning Accelerators

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045206

Linghao Song, Fan Chen, Yiran Chen, H. Li

引用次数: 0

MindReading: An Ultra-Low-Power Photonic Accelerator for EEG-based Human Intention Recognition 一种基于脑电图的人类意图识别的超低功率光子加速器

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045333

Qian Lou, Wenyang Liu, Weichen Liu, Feng Guo, Lei Jiang

引用次数: 3

Concurrency in DD-based Quantum Circuit Simulation 基于硬盘的量子电路并行仿真

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045711

S. Hillmich, Alwin Zulehner, R. Wille

引用次数: 9

An FPGA based Network Interface Card with Query Filter for Storage Nodes of Big Data Systems 基于FPGA的大数据存储节点查询滤波网络接口卡

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045372

Ying Li, Jinyu Zhan, Wei Jiang, Junting Wu, Jianping Zhu

{"title":"An FPGA based Network Interface Card with Query Filter for Storage Nodes of Big Data Systems","authors":"Ying Li, Jinyu Zhan, Wei Jiang, Junting Wu, Jianping Zhu","doi":"10.1109/ASP-DAC47756.2020.9045372","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045372","url":null,"abstract":"In this paper, we are interested in improving the data processing of storage and computing separated Big Data systems. We propose an Field Programmable Gate Array (FPGA) based Network Interface Card with Query Filter (NIC-QF) to accelerate the data query efficiency of storage nodes and reduce the workloads of computing nodes and the communication overheads between them. NIC-QF designed with PCIe core, query filter and NIC communication can filter the original data on storage nodes as an implicit coprocessor and directly send the filtered data to computing nodes of Big Data systems. Filter units in query filter can perform multiple SQL tasks in parallel, and each filter unit is internally pipelined, which can further speed up the data processing. Filter units can be designed to support general SQL queries on different data formats and we implement two schemes for TextFile and RCFile separately. Based on TPC-H benchmark and Tencent data set, we conduct extensive experiments to evaluate our design, which can achieve averagely up to 46.91% faster than the traditional approach.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115830713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2