2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献_第4页

Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search 基于加速器感知神经结构搜索的黑匣子搜索空间分析

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045179

Shulin Zeng, Hanbo Sun, Yu Xing, Xuefei Ning, Yi Shan, Xiaoming Chen, Yu Wang, Huazhong Yang

{"title":"Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search","authors":"Shulin Zeng, Hanbo Sun, Yu Xing, Xuefei Ning, Yi Shan, Xiaoming Chen, Yu Wang, Huazhong Yang","doi":"10.1109/ASP-DAC47756.2020.9045179","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045179","url":null,"abstract":"Neural Architecture Search (NAS) is a promising approach to discover good neural network architectures for given applications. Among the three basic components in a NAS system (search space, search strategy, and evaluation), prior work mainly focused on the development of different search strategies and evaluation methods. As most of the previous hardware-aware search space designs aimed at CPUs and GPUs, it still remains a challenge to design a suitable search space for Deep Neural Network (DNN) accelerators. Besides, the architectures and compilers of DNN accelerators vary greatly, so it is quite difficult to get a unified and accurate evaluation of the latency of DNN across different platforms. To address these issues, we propose a black box profiling-based search space tuning method and further improve the latency evaluation by introducing a layer adaptive latency correction method. Used as the first stage in our general accelerator-aware NAS pipeline, our proposed methods could provide a smaller and dynamic search space with a controllable trade-off between accuracy and latency for DNN accelerators. Experimental results on CIFAR-10 and ImageNet demonstrate our search space is effective with up to 12.7% improvement in accuracy and 2.2x reduction of latency, and also efficient by reducing the search time and GPU memory up to 4.35x and 6.25x, respectively.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132634059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Tolerating Retention Failures in Neuromorphic Fabric based on Emerging Resistive Memories 基于新兴抵抗性记忆的神经形态结构耐受保留失败

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045339

Christopher Münch, R. Bishnoi, M. Tahoori

引用次数: 6

Reliable Power Grid Network Design Framework Considering EM Immortalities for Multi-Segment Wires 考虑多段导线电磁不变性的可靠电网网络设计框架

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045673

Han Zhou, Shuyuan Yu, Zeyu Sun, S. Tan

{"title":"Reliable Power Grid Network Design Framework Considering EM Immortalities for Multi-Segment Wires","authors":"Han Zhou, Shuyuan Yu, Zeyu Sun, S. Tan","doi":"10.1109/ASP-DAC47756.2020.9045673","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045673","url":null,"abstract":"This paper presents a new power grid network design and optimization technique that considers the new EM immortality constraint due to EM void saturation volume for multi-segment interconnects. Void may grow to its saturation volume without changing the wire resistance significantly. However, this phenomenon was ignored in existing EM-aware optimization methods. By considering this new effect, we can remove more conservativeness in the EM-aware on-chip power grid design. Along with recently proposed nucleation phase immortality constraint for multi-segment wires, we show that both EM immortality constraints can be naturally integrated into the existing programming based power grid optimization framework. To further mitigate the overly conservative problem of existing immortality-constrained optimization methods, we further explore two strategies: first we size up failed wires to meet one of the immorality conditions subject to design rules; second, we consider the EM-induced aging effects on power supply networks for a target lifetime, which allows some short-lifetime wires to fail and optimizes the rest of the wires. Numerical results on a number of IBM-format power grid networks demonstrate that the new method can reduce more power grid area compared to the existing EM-immortality constrained optimizations. Furthermore, the new method can optimize power grids with nucleated wires, which would not be possible with the existing methods.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133321167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Towards Read-Intensive Key-Value Stores with Tidal Structure Based on LSM-Tree 基于LSM-Tree的潮汐结构读密集型键值存储

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045617

Yi Wang, Shangyu Wu, Rui Mao

{"title":"Towards Read-Intensive Key-Value Stores with Tidal Structure Based on LSM-Tree","authors":"Yi Wang, Shangyu Wu, Rui Mao","doi":"10.1109/ASP-DAC47756.2020.9045617","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045617","url":null,"abstract":"Key-value store has played a critical role in many large-scale data storage applications. The log-structured merge-tree (LSM-tree) based key-value store achieves excellent performance on write-intensive workloads which is mainly benefited from the mechanism of converting a batch of random writes into sequential writes. However, LSM-tree doesn’t improve a lot in read-intensive workloads which takes a higher latency. The main reason lies in the hierarchical search mechanism in LSM-tree structure. The key challenge is how to propose new strategies based on the existing LSM-tree structure to improve read efficiency and reduce read amplifications.This paper proposes Tidal-tree, a novel data structure where data flows inside LSM-tree like Tidal waves. Tidal-tree targets at improving read efficiency in read-intensive workloads. Tidal-tree allows frequently accessed files at the bottom of LSM-tree to move to higher positions, thereby reducing read latency. Tidal-tree also makes LSM-tree into a variable shape to cater for different characteristic workloads. To evaluate the performance of Tidal-tree, we conduct a series of experiments using standard benchmarks from YCSB. The experimental results show that Tidal-tree can significantly improve read efficiency and reduce read amplifications compared with representative schemes.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114758528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Search-free Accelerator for Sparse Convolutional Neural Networks 稀疏卷积神经网络的无搜索加速器

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045580

Bosheng Liu, Xiaoming Chen, Yinhe Han, Ying Wang, Jiajun Li, Haobo Xu, Xiaowei Li

{"title":"Search-free Accelerator for Sparse Convolutional Neural Networks","authors":"Bosheng Liu, Xiaoming Chen, Yinhe Han, Ying Wang, Jiajun Li, Haobo Xu, Xiaowei Li","doi":"10.1109/ASP-DAC47756.2020.9045580","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045580","url":null,"abstract":"Sparsification is an efficient solution to reduce the demand of on-chip memory space for deep convolutional neural networks (CNNs). Most of state-of-the-art CNN accelerators can deliver high throughput for sparse CNNs by searching pairs of nonzero weights and activations, and then sending them to processing elements (PEs) for multiplication-accumulation (MAC) operations. However, their PE scales are difficult to be increased for superior and efficient computing because of the significant internal interconnect and memory bandwidth consumption. To deal with this dilemma, we propose a sparsity-aware architecture, called Swan, which frees the search process for sparse CNNs under limited interconnect and bandwidth resources. The architecture comprises two parts: a MAC unit that can free the search operation for the sparsity-aware MAC calculation, and a systolic compressive dataflow that well suits the MAC architecture and greatly reuses inputs for interconnect and bandwidth saving. With the proposed architecture, only one column of the PEs needs to load/store data while all PEs can operate in full scale. Evaluation results based on a place-and-route process show that the proposed design, in a compact factor of 4096 PEs, 4.9TOP/s peak performance, and 2.97W power running at 600MHz, achieves 1.5-2.1× speedup and 6.0-9.1× higher energy efficiency than state-of-the-art CNN accelerators with the same PE scale.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127826894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Machine Learning Based Online Full-Chip Heatmap Estimation 基于机器学习的在线全芯片热图估计

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045204

Sheriff Sadiqbatcha, Yue Zhao, Jinwei Zhang, H. Amrouch, J. Henkel, S. Tan

{"title":"Machine Learning Based Online Full-Chip Heatmap Estimation","authors":"Sheriff Sadiqbatcha, Yue Zhao, Jinwei Zhang, H. Amrouch, J. Henkel, S. Tan","doi":"10.1109/ASP-DAC47756.2020.9045204","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045204","url":null,"abstract":"Runtime power and thermal control is crucial in any modern processor. However, these control schemes require accurate real-time temperature information, ideally of the entire die area, in order to be effective. On-chip temperature sensors alone cannot provide the full-chip temperature information since the number of sensors that are typically available is very limited due to their high area and power overheads. Furthermore, as we will demonstrate, the peak locations within hot-spots are not stationary and are very workload dependent, making it difficult to rely on fixed temperature sensors alone. Therefore, we propose a novel approach to real-time estimation of full-chip transient heatmaps for commercial processors based on machine learning. The model derived in this work supplements the temperature data sensed from the existing on-chip sensors, allowing for the development of more robust runtime power and thermal control schemes that can take advantage of the additional thermal information that is otherwise not available. The new approach involves offline acquisition of accurate spatial and temporal heatmaps using an infrared thermal imaging setup while nominal working conditions are maintained on the chip. To build the dynamic thermal model, we apply Long-Short-Term-Memory (LSTM) neutral networks with system-level variables such as chip frequency, instruction counts, and other performance metrics as inputs. To reduce the dimensionality of the model, 2D spatial discrete cosine transformation (DCT) is first performed on the heatmaps so that they can be expressed with just their dominant DCT frequencies. Our study shows that only 6×6 DCT coefficients are required to maintain sufficient accuracy across a variety of workloads. Experimental results show that the proposed approach can estimate the full-chip heatmaps with less than $1.4^{o}$C root-mean-square-error and take only $sim$19ms for each inference which suits well for real-time use.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127301058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Boosting the Profitability of NVRAM-based Storage Devices via the Concept of Dual-Chunking Data Deduplication 通过双分块重复数据删除的概念提升nvram存储设备的盈利能力

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045622

Shuo-Han Chen, Yu-Pei Liang, Yuan-Hao Chang, H. Wei, W. Shih

{"title":"Boosting the Profitability of NVRAM-based Storage Devices via the Concept of Dual-Chunking Data Deduplication","authors":"Shuo-Han Chen, Yu-Pei Liang, Yuan-Hao Chang, H. Wei, W. Shih","doi":"10.1109/ASP-DAC47756.2020.9045622","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045622","url":null,"abstract":"With the latest advance in the non-volatile random-access memory (NVRAM), NVRAM is widely considered as the mainstream for the next-generation storage mediums. NVRAM has numerous attractive features, which include byte addressability, limited idle energy consumption, and great read/write access speed. However, owing to the high manufacturing cost of NVRAM, the incentive of deploying NVRAM in consumer electronics is lowered due to the consideration of profitability. To resolve the profitability issue and bring the benefits of NVRAM into the design of consumer electronics, avoiding storing duplicate data on NVRAM becomes a crucial task for lowering the demand and deployment cost of NVRAM. Such observation motivates us to propose a data deduplication extended file system design (DeEXT) to boost the profitability of NVRAM via the concept of dual-chunking data deduplication while considering the characteristics of NVRAM and duplicate data content. The proposed DeEXT was then evaluated by real-world data deduplication traces with encouraging results.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121904789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Enhancing Generalization of Wafer Defect Detection by Data Discrepancy-aware Preprocessing and Contrast-varied Augmentation 利用数据差异感知预处理和对比度变化增强增强晶圆缺陷检测的泛化

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045391

Chaofei Yang, H. Li, Yiran Chen, Jiang Hu

引用次数: 0

ASP-DAC 2020 Author Index ASP-DAC 2020作者索引

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/asp-dac47756.2020.9045253

引用次数: 0

Co-Exploring Neural Architecture and Network-on-Chip Design for Real-Time Artificial Intelligence 实时人工智能的神经结构和片上网络设计的共同探索

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI: 10.1109/ASP-DAC47756.2020.9045595

Lei Yang, Weiwen Jiang, Weichen Liu, E. Sha, Yiyu Shi, J. Hu

{"title":"Co-Exploring Neural Architecture and Network-on-Chip Design for Real-Time Artificial Intelligence","authors":"Lei Yang, Weiwen Jiang, Weichen Liu, E. Sha, Yiyu Shi, J. Hu","doi":"10.1109/ASP-DAC47756.2020.9045595","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045595","url":null,"abstract":"Hardware-aware Neural Architecture Search (NAS), which automatically finds an architecture that works best on a given hardware design, has prevailed in response to the ever-growing demand for real-time Artificial Intelligence (AI). However, in many situations, the underlying hardware is not pre-determined. We argue that simply assuming an arbitrary yet fixed hardware design will lead to inferior solutions, and it is best to co-explore neural architecture space and hardware design space for the best pair of neural architecture and hardware design. To demonstrate this, we employ Network-on-Chip (NoC) as the infrastructure and propose a novel framework, namely NANDS, to co-explore NAS space and NoC Design Search (NDS) space with the objective to maximize accuracy and throughput. Since two metrics are tightly coupled, we develop a multi-phase manager to guide NANDS to gradually converge to solutions with the best accuracy-throughput tradeoff. On top of it, we propose techniques to detect and alleviate timing performance bottleneck, which allows better and more efficient exploration of NDS space. Experimental results on common datasets, CIFAR10, CIFAR-100 and STL-10, show that compared with state-of-the-art hardware-aware NAS, NANDS can achieve 42.99% higher throughput along with 1.58% accuracy improvement. There are cases where hardware-aware NAS cannot find any feasible solutions while NANDS can.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123203624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32