{"title":"Efficient Subquadratic Space Complexity Digit-Serial Multipliers over GF(2m) based on Bivariate Polynomial Basis Representation","authors":"Chiou-Yng Lee, Jiafeng Xie","doi":"10.1109/ASP-DAC47756.2020.9045615","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045615","url":null,"abstract":"Digit-serial finite field multipliers over GF($2^{m}$) with subquadratic space complexity are critical components to many applications such as elliptic curve cryptography. In this paper, we propose a pair of novel digit-serial multipliers based on bivariate polynomial basis (BPB). Firstly, we have proposed a novel digit-serial BPB multiplication algorithm based on a new decomposition strategy. Secondly, the proposed algorithm is properly mapped into a pair of pipelined and non-pipelined digit-serial multipliers. Lastly, through the detailed complexity analysis and comparison, the proposed designs are found to have less area-time complexities than the competing ones.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131785188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximation of Quantum States Using Decision Diagrams","authors":"Alwin Zulehner, S. Hillmich, I. Markov, R. Wille","doi":"10.1109/ASP-DAC47756.2020.9045454","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045454","url":null,"abstract":"The computational power of quantum computers poses major challenges to new design tools since representing pure quantum states typically requires exponentially large memory. As shown previously, decision diagrams can reduce these memory requirements by exploiting redundancies. In this work, we demonstrate further reductions by allowing for small inaccuracies in the quantum state representation. Such inaccuracies are legitimate since quantum computers themselves experience gate and measurement errors and since quantum algorithms are somewhat resistant to errors (even without error correction). We develop four dedicated schemes that exploit these observations and effectively approximate quantum states represented by decision diagrams. We empirically show that the proposed schemes reduce the size of decision diagrams by up to several orders of magnitude while controlling the fidelity of approximate quantum state representations.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"42 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132835685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Efficient Kyber on FPGAs: A Processor for Vector of Polynomials","authors":"Zhaohui Chen, Yuan Ma, Tianyu Chen, Jingqiang Lin, Jiwu Jing","doi":"10.1109/ASP-DAC47756.2020.9045459","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045459","url":null,"abstract":"Kyber is a promising candidate in post-quantum cryptography standardization process. In this paper, we propose a targeted optimization strategy and implement a processor for Kyber on FPGAs. By merging the operations, we cut off 29.4% clock cycles for Kyber512 and 33.3% for Kyber1024 compared with the textbook implementations. We utilize Gentlemen-Sande (GS) butterfly to optimize the Number-Theoretic Transform (NTT) implementation. The bottleneck of memory access is broken taking advantage of a dual-column sequential scheme. We further propose a pipeline architecture for better performance. The optimizations help the processor achieve 31684 NTT operations per second using only 477 LUTs, 237 FFs and 1 DSP. Our strategy is at least 3x more efficient than the state-of-the-art module for NTT with a similar security level.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131052089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victor M. van Santen, P. Genssler, Om. Prakash, Simon Thomann, J. Henkel, H. Amrouch
{"title":"Impact of Self-Heating on Performance, Power and Reliability in FinFET Technology","authors":"Victor M. van Santen, P. Genssler, Om. Prakash, Simon Thomann, J. Henkel, H. Amrouch","doi":"10.1109/ASP-DAC47756.2020.9045582","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045582","url":null,"abstract":"Self-heating is one of the biggest threats to reliability in current and advanced CMOS technologies like FinFET and Nanowire, respectively. Encapsulating the channel with the gate dielectric improved electrostatics, but also thermally insulates the channel resulting in elevated channel temperatures as the generated heat is trapped within the channel. Elevated channel temperatures lowers the performance, increases leakage power and degrades the reliability of circuits. Self-heating becomes worse in each new transistor structure (from planar transistor to FinFET to Nanowire) due to the ever-increasing thermal resistance of the transistor. This leads to elevated temperatures, which must be carefully considered while designing circuits. Otherwise, reliability cannot be ensured. This work presents a self-heating study to illustrate how self-heating matters in digital circuits. It also explores the impact of running workloads in SRAM arrays, such as register files in CPUs, and how self-heating effects in SRAM cells can be mitigated.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133807798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dennis D. Weller, Michael Hefenbrock, M. Tahoori, J. Aghassi‐Hagmann, M. Beigl
{"title":"Programmable Neuromorphic Circuit based on Printed Electrolyte-Gated Transistors","authors":"Dennis D. Weller, Michael Hefenbrock, M. Tahoori, J. Aghassi‐Hagmann, M. Beigl","doi":"10.1109/ASP-DAC47756.2020.9045211","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045211","url":null,"abstract":"Neuromorphic computing systems have demonstrated many advantages for popular classification problems with significantly less computational resources. We present in this paper the design, fabrication and training of a programmable neuromorphic circuit, which is based on printed electrolytegated field-effect transistor (EGFET). Based on printable neuron architecture involving several resistors and one transistor, the proposed circuit can realize multiply-add and activation functions. The functionality of the circuit, i.e. the weights of the neural network, can be set during a post-fabrication step in form of printing resistors to the crossbar. Besides the fabrication of a programmable neuron, we also provide a learning algorithm, tailored to the requirements of the technology and the proposed programmable neuron design, which is verified through simulations. The proposed neuromorphic circuit operates at 5V and occupies 385mm2 of area.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114426417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Moulik, Rishabh Chaudhary, Zinea Das, A. Sarkar
{"title":"EA-HRT: An Energy-Aware scheduler for Heterogeneous Real-Time systems","authors":"S. Moulik, Rishabh Chaudhary, Zinea Das, A. Sarkar","doi":"10.1109/ASP-DAC47756.2020.9045240","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045240","url":null,"abstract":"Developing energy-efficient schedulers for real-time heterogeneous platforms executing periodic tasks is an onerous as well as a computationally challenging issue. This research presents a heuristic strategy named, EA-HRT, for DVFS based energy-aware scheduling of a set of periodic tasks executing on a heterogeneous multicore platform. Initially it calculates the execution demands of every task on each of the different type of cores. Then, it simultaneously allocates each task on available cores and selects operating frequencies for the concerned cores such that the summation of execution demands of all tasks are met as well as there is minimum change in energy consumption for the system. Experimental results show that our proposed strategy is not only able to achieve appreciable energy savings with respect to state-of-the-art (2% to 37% on average) but also enables significant improvement in resource utilization (as high as 57%).","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116896739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Tuning-Free Hardware Reservoir Based on MOSFET Crossbar Array for Practical Echo State Network Implementation","authors":"Yuki Kume, S. Bian, Takashi Sato","doi":"10.1109/ASP-DAC47756.2020.9045694","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045694","url":null,"abstract":"Echo state network (ESN) is a class of recurrent neural network, and is known for drastically reducing the training time by the use of reservoir, a random and fixed network as the input and middle layers. In this paper, we propose a hardware implementation of ESN that uses practical MOSFET-based reservoir. As opposed to existing reservoirs that require additional tuning of network weights for improved stability, our ESN requires no post-training parameter tuning. To this end, we apply the circular law of random matrix to sparse reservoirs to determine a stable and fixed feedback gain. Through the evaluations using Mackey-Glass time-series dataset, the proposed ESN performs successful inference without post parameter tuning.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125802181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng-Jung Yu, Chen-Chien Kao, Chia‐Han Huang, I. Jiang
{"title":"Equivalent Capacitance Guided Dummy Fill Insertion for Timing and Manufacturability","authors":"Sheng-Jung Yu, Chen-Chien Kao, Chia‐Han Huang, I. Jiang","doi":"10.1109/ASP-DAC47756.2020.9045668","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045668","url":null,"abstract":"To improve manufacturability, dummy fill insertion is widely adopted for reducing the thickness variation after chemical mechanical polishing. However, inserted metal fills induce significant coupling to nearby signal nets, thus possibly incurring timing degradation. Existing timing-aware fill insertion strategies focus on optimizing induced coupling capacitance instead of resultant equivalent capacitance. Therefore, the impact on timing cannot be fully captured. In contrast, in this paper, we analyze equivalent capacitance friendly regions for dummy fills. The analysis can wisely guide dummy fill insertion to prevent unwanted and unnecessary increase in the resultant equivalent capacitance of timing critical nets. Experimental results based on the ICCAD 2018 CAD Contest benchmark suite show that our solution outperforms the contest winning teams and state-of-the-art work. Moreover, our analysis results are highly correlated to actual equivalent capacitance values and indeed provide accurate guidance for timing-aware dummy fill insertion.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129890358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongwon Park, Daeyeal Lee, Ilgweon Kang, Sicun Gao, Bill Lin, Chung-Kuan Cheng
{"title":"SP&R: Simultaneous Placement and Routing framework for standard cell synthesis in sub-7nm","authors":"Dongwon Park, Daeyeal Lee, Ilgweon Kang, Sicun Gao, Bill Lin, Chung-Kuan Cheng","doi":"10.1109/ASP-DAC47756.2020.9045729","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045729","url":null,"abstract":"Standard cell synthesis requires careful engineering approaches to ensure routability across various digital IC designs since physical design (PD) for sub-7nm technology nodes demands holistic efforts to address urgent and nontrivial design challenges. The smaller number of routing tracks and more complex design rules due to the sophisticated multi-patterning technology make place-and-route (P&R) for designing a standard cell extremely hard and time-consuming. Many conventional approaches have been suggested for improving transistor-level P&R and pin accessibility, nonetheless insufficient because of the heuristic/divide-and-conquer manners.In this paper, we propose a novel framework, SP&R, which simultaneously solves P&R for designing standard cell’s layout without deploying any sequential procedures (between place and route steps) by using dynamic pin allocation-based cell synthesis. The proposed SP&R utilizes the Optimization Modulo Theories (OMT), an extension of the Satisfiability modulo theories (SMT), to obtain optimal standard cell layout by virtue of SAT (Boolean Satisfiability)-based fast reasoning ability. We validate that our SP&R framework achieves 10.5% of reduction on average in terms of metal length compared to the sequential approach, through practical standard cell designs targeting sub-7nm technology nodes.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128747724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Event Delivery using Prediction for Faster Parallel SystemC Simulation","authors":"Zhongqi Cheng, E. Arasteh, R. Dömer","doi":"10.1109/ASP-DAC47756.2020.9045492","DOIUrl":"https://doi.org/10.1109/ASP-DAC47756.2020.9045492","url":null,"abstract":"Out-of-order Parallel Discrete Event Simulation (OoO PDES) is an advanced simulation approach that efficiently verifies and validates SystemC models. To preserve the simulation semantics, OoO PDES performs a conservative event delivery strategy which often postpones the execution of waiting threads due to unknown future behaviors of the model. In this paper, based on predicted behaviors of threads, we introduce a novel event delivery strategy that allows waiting threads to resume execution earlier, resulting in significantly increased simulation speed. Experimental results show that the proposed approach increases the OoO PDES simulation speed by up to 4.9x compared to the original one on a 4-core machine.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128239568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}