{"title":"A near-storage framework for boosted data preprocessing of mass spectrum clustering","authors":"Weihong Xu, Jaeyoung Kang, T. Simunic","doi":"10.1145/3489517.3530449","DOIUrl":"https://doi.org/10.1145/3489517.3530449","url":null,"abstract":"Mass spectrometry (MS) has been a key to proteomics and metabolomics due to its unique ability to identify and analyze protein structures. Modern MS equipment generates massive amount of tandem mass spectra with high redundancy, making spectral analysis the major bottleneck in design of new medicines. Mass spectrum clustering is one promising solution as it greatly reduces data redundancy and boosts protein identification. However, state-of-the-art MS tools take many hours to run spectrum clustering. Spectra loading and preprocessing consumes average 82% execution time and energy during clustering. We propose a near-storage framework, MSAS, to speed up spectrum preprocessing. Instead of loading data into host memory and CPU, MSAS processes spectra near storage, thus reducing the expensive cost of data movement. We present two types of accelerators that leverage internal bandwidth at two storage levels: SSD and channel. The accelerators are optimized to match the data rate at each storage level with negligible overhead. Our results demonstrate that the channel-level design yields the best performance improvement for preprocessing - it is up to 187X and 1.8X faster than the CPU and the state-of-the-art in-storage computing solution, INSIDER, respectively. After integrating channel-level MSAS into existing MS clustering tools, we measure system level improvements in speed of 3.5X to 9.8X with 2.8X to 11.9X better energy efficiency.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123274153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Architecting DDR5 DRAM caches for non-volatile memory systems","authors":"Xin Xin, Wanyi Zhu, Li Zhao","doi":"10.1145/3489517.3530570","DOIUrl":"https://doi.org/10.1145/3489517.3530570","url":null,"abstract":"With the release of Intel's Optane DIMM, Non-Volatile Memories (NVMs) are emerging as viable alternatives to DRAM memories because of the advantage of higher capacity. However, the higher latency and lower bandwidth of Optane prevent it from outright replacing DRAM. A prevailing strategy is to employ existing DRAM as a data cache for Optane, thereby achieving overall benefit in capacity, bandwidth, and latency. In this paper, we inspect new features in DDR5 to better support the DRAM cache design for Optane. Specifically, we leverage the two-level ECC scheme, i.e., DIMM ECC and on-die ECC, in DDR5 to construct a narrower channel for tag probing and propose a new operation for fast cache replacement. Experimental results show that our proposed strategy can achieve, on average, 26% performance improvement.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123568898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maimaiti Nazhamaiti, Haijin Su, Han Xu, Zheyu Liu, F. Qiao, Qi Wei, Zidong Du, Xinghua Yang, Li Luo
{"title":"In-situ self-powered intelligent vision system with inference-adaptive energy scheduling for BNN-based always-on perception","authors":"Maimaiti Nazhamaiti, Haijin Su, Han Xu, Zheyu Liu, F. Qiao, Qi Wei, Zidong Du, Xinghua Yang, Li Luo","doi":"10.1145/3489517.3530554","DOIUrl":"https://doi.org/10.1145/3489517.3530554","url":null,"abstract":"This paper proposes an in-situ self-powered BNN-based intelligent visual perception system that harvests light energy utilizing the indispensable image sensor itself. The harvested energy is allocated to the low-power BNN computation modules layer by layer, adopting a light-weighted duty-cycling-based energy scheduler. A software-hardware co-design method, which exploits the layer-wise error tolerance of BNN as well as the computing-error and energy consumption characteristics of the computation circuit, is proposed to determine the parameters of the energy scheduler, achieving high energy efficiency for self-powered BNN inference. Simulation results show that with the proposed inference-adaptive energy scheduling method, self-powered MNIST classification task can be performed at a frame rate of 4 fps if the harvesting power is 1μW, while guaranteeing at least 90% inference accuracy using binary LeNet-5 network.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129696012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xing Li, Lei Chen, Fan Yang, Mingxuan Yuan, Hongli Yan, Yupeng Wan
{"title":"HIMap","authors":"Xing Li, Lei Chen, Fan Yang, Mingxuan Yuan, Hongli Yan, Yupeng Wan","doi":"10.1145/3489517.3530460","DOIUrl":"https://doi.org/10.1145/3489517.3530460","url":null,"abstract":"Recently, many models show their superiority in sequence and parameter tuning. However, they usually generate non-deterministic flows and require lots of training data. We thus propose a heuristic and iterative flow, namely HIMap, for deterministic logic synthesis. In which, domain knowledge of the functionality and parameters of synthesis operators and their correlations to netlist PPA is fully utilized to design synthesis templates for various objetives. We also introduce deterministic and effective heuristics to tune the templates with relatively fixed operator combinations and iteratively improve netlist PPA. Two nested iterations with local searching and early stopping can thus generate dynamic sequence for various circuits and reduce runtime. HIMap improves 13 best results of the EPFL combinational benchmarks for delay (5 for area). Especially, for several arithmetic benchmarks, HIMap significantly reduces LUT-6 levels by 11.6 ~ 21.2% and delay after P&R by 5.0 ~ 12.9%.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128592693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GraphRing","authors":"Zerun Li, Xiaoming Chen, Yinhe Han","doi":"10.1145/3489517.3530571","DOIUrl":"https://doi.org/10.1145/3489517.3530571","url":null,"abstract":"Due to the irregular memory access and high bandwidth demanding, graph processing is usually inefficient on conventional computer architectures. The recent development of the processing-in-memory (PIM) technique such as hybrid memory cube (HMC) has provided a feasible design direction for graph processing accelerators. Although PIM provides high internal bandwidth, inter-node memory access is inevitable in large-scale graph processing, which greatly affects the performance. In this paper, we propose an HMC-based graph processing framework, GraphRing. GraphRing is a software-hardware codesign framework that optimizes inter-HMC communication. It contains a regularity- and locality-aware graph execution model and a ring-based multi-HMC architecture. The evaluation results based on 5 graph datasets and 4 graph algorithms show that GraphRing achieves on average 2.14× speedup and 3.07× inter-HMC communication energy saving, compared with GraphQ, a state-of-the-art graph processing architecture.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"27 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127920828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware-efficient stochastic rounding unit design for DNN training: late breaking results","authors":"Sung-En Chang, Geng Yuan, Alec Lu, Mengshu Sun, Yanyu Li, Xiaolong Ma, Z. Li, Yanyue Xie, Minghai Qin, Xue Lin, Zhenman Fang, Yanzhi Wang, Zhenman, Fang","doi":"10.1145/3489517.3530619","DOIUrl":"https://doi.org/10.1145/3489517.3530619","url":null,"abstract":"Stochastic rounding is crucial in the training of low-bit deep neural networks (DNNs) to achieve high accuracy. Unfortunately, prior studies require a large number of high-precision stochastic rounding units (SRUs) to guarantee the low-bit DNN accuracy, which involves considerable hardware overhead. In this paper, we propose an automated framework to explore hardware-efficient low-bit SRUs (ESRUs) that can still generate high-quality random numbers to guarantee the accuracy of low-bit DNN training. Experimental results using state-of-the-art DNN models demonstrate that, compared to the prior 24-bit SRU with 24-bit pseudo random number generator (PRNG), our 8-bit with 3-bit PRNG reduces the SRU resource usage by 9.75× while achieving a higher accuracy.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121177999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Ni, Mariam Issa, Danny Abraham, M. Imani, Xunzhao Yin, M. Imani
{"title":"HDPG","authors":"Yang Ni, Mariam Issa, Danny Abraham, M. Imani, Xunzhao Yin, M. Imani","doi":"10.1145/3489517.3530668","DOIUrl":"https://doi.org/10.1145/3489517.3530668","url":null,"abstract":"Traditional robot control or more general continuous control tasks often rely on carefully hand-crafted classic control methods. These models often lack the self-learning adaptability and intelligence to achieve human-level control. On the other hand, recent advancements in Reinforcement Learning (RL) present algorithms that have the capability of human-like learning. The integration of Deep Neural Networks (DNN) and RL thereby enables autonomous learning in robot control tasks. However, DNN-based RL brings both high-quality learning and high computation cost, which is no longer ideal for currently fast-growing edge computing scenarios. In this paper, we introduce HDPG, a highly-efficient policy-based RL algorithm using Hyperdimensional Computing. Hyperdimensional computing is a lightweight brain-inspired learning methodology; its holistic representation of information leads to a well-defined set of hardware-friendly high-dimensional operations. Our HDPG fully exploits the efficient HDC for high-quality state value approximation and policy gradient update. In our experiments, we use HDPG for robotics tasks with continuous action space and achieve significantly higher rewards than DNN-based RL. Our evaluation also shows that HDPG achieves 4.7× faster and 5.3× higher energy efficiency than DNN-based RL running on embedded FPGA.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121990820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hexagons are the bestagons: design automation for silicon dangling bond logic","authors":"Marcel Walter, S. S. H. Ng, K. Waluś, R. Wille","doi":"10.1145/3489517.3530525","DOIUrl":"https://doi.org/10.1145/3489517.3530525","url":null,"abstract":"Field-coupled Nanocomputing (FCN) defines a class of post-CMOS nanotechnologies that promises compact layouts, low power operation, and high clock rates. Recent breakthroughs in the fabrication of Silicon Dangling Bonds (SiDBs) acting as quantum dots enabled the demonstration of a sub-30 nm2 OR gate and wire segments. This motivated the research community to invest manual labor in the design of additional gates and whole circuits which, however, is currently severely limited by scalability issues. In this work, these limitations are overcome by the introduction of a design automation framework that establishes a flexible topology based on hexagons as well as a corresponding Bestagon gate library for this technology and, additionally, provides automatic methods for physical design. By this, the first design automation solution for the promising SiDB platform is proposed. In an effort to support open research and open data, the resulting framework and all design files will be made available.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"355 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122717508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HDLock","authors":"Shijin Duan, Shaolei Ren, Xiaolin Xu","doi":"10.1145/3489517.3530515","DOIUrl":"https://doi.org/10.1145/3489517.3530515","url":null,"abstract":"Hyperdimensional Computing (HDC) is facing infringement issues due to straightforward computations. This work, for the first time, raises a critical vulnerability of HDC --- an attacker can reverse engineer the entire model, only requiring the unindexed hypervector memory. To mitigate this attack, we propose a defense strategy, namely HDLock, which significantly increases the reasoning cost of encoding. Specifically, HDLock adds extra feature hypervector combination and permutation in the encoding module. Compared to the standard HDC model, a two-layer-key HDLock can increase the adversarial reasoning complexity by 10 order of magnitudes without inference accuracy loss, with only 21% latency overhead.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"353 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115983631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-electrostatic FPGA placement considering SLICEL-SLICEM heterogeneity and clock feasibility","authors":"Jing Mai, Yibai Meng, Zhixiong Di, Yibo Lin","doi":"10.1145/3489517.3530568","DOIUrl":"https://doi.org/10.1145/3489517.3530568","url":null,"abstract":"Modern field-programmable gate arrays (FPGAs) contain heterogeneous resources, including CLB, DSP, BRAM, IO, etc. A Configurable Logic Block (CLB) slice is further categorized to SLICEL and SLICEM, which can be configured as specific combinations of instances in {LUT, FF, distributed RAM, SHIFT, CARRY}. Such kind of heterogeneity challenges the existing FPGA placement algorithms. Meanwhile, limited clock routing resources also lead to complicated clock constraints, causing difficulties in achieving clock feasible placement solutions. In this work, we propose a heterogeneous FPGA placement framework considering SLICEL-SLICEM heterogeneity and clock feasibility based on a multi-electrostatic formulation. We support a comprehensive set of the aforementioned instance types with a uniform algorithm for wirelength, routability, and clock optimization. Experimental results on both academic and industrial benchmarks demonstrate that we outperform the state-of-the-art placers in both quality and efficiency.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126167278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}