{"title":"Mixed-Signal POp/J Computing with Nonvolatile Memories","authors":"M. Mahmoodi, D. Strukov","doi":"10.1145/3194554.3194612","DOIUrl":"https://doi.org/10.1145/3194554.3194612","url":null,"abstract":"The present-day revolution in deep learning was triggered not by any significant algorithm breakthrough, but by the use of more powerful GPU hardware [1]. Though this revolution has stimulated the development of even more powerful dedicated digital systems [2, 3], their speed and energy efficiency are still insufficient for ultrafast pattern classification and more ambitious cognitive tasks. The main reason is that the use of digital operations for the implementation of neuromorphic networks, with their high redundancy and noise/variability tolerance, is inherently unnatural. On the other hand, the network performance may be dramatically improved using mixed-signal integrated circuits, where the key inference-stage operation, the vector-by-matrix multiplication, is implemented on the physical level by utilization of the fundamental Ohm and Kirchhoff laws [4-6].","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114533968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fanchao Wang, Hanbin Zhu, Pranjay Popli, Yao Xiao, P. Bogdan, Shahin Nazarian
{"title":"Accelerating Coverage Directed Test Generation for Functional Verification: A Neural Network-based Framework","authors":"Fanchao Wang, Hanbin Zhu, Pranjay Popli, Yao Xiao, P. Bogdan, Shahin Nazarian","doi":"10.1145/3194554.3194561","DOIUrl":"https://doi.org/10.1145/3194554.3194561","url":null,"abstract":"With increasing design complexity, the correlation between test transactions and functional properties becomes non-intuitive, hence impacting the reliability of test generation. This paper presents a modified coverage directed test generation based on an Artificial Neural Network (ANN). The ANN extracts features of test transactions and only those which are learned to be critical, will be sent to the design under verification. Furthermore, the priority of coverage groups is dynamically learned based on the previous test iterations. With ANN-based screening, low-coverage or redundant assertions will be filtered out, which helps accelerate the verification process. This allows our framework to learn from the results of the previous vectors and use that knowledge to select the following test vectors. Our experimental results confirm that our learning-based framework can improve the speed of existing function verification techniques by 24.5x and also also deliver assertion coverage improvement, ranging from 4.3x to 28.9x, compared to traditional coverage directed test generation, implemented in UVM.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114970409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low Complexity Burst Error Correcting Codes to Correct MBUs in SRAMs","authors":"Abhishek Das, N. Touba","doi":"10.1145/3194554.3194570","DOIUrl":"https://doi.org/10.1145/3194554.3194570","url":null,"abstract":"Multiple bit upsets (MBUs) caused by high energy radiation is the most common source of soft errors in static random-access memories (SRAMs) affecting multiple cells. Burst error correcting Hamming codes have most commonly been used to correct MBUs in SRAM cell since they have low redundancy and low decoder latency. But with technology scaling, the number of bits being affected increases, thus requiring a need for increasing the burst size that can be corrected. However, this is a problem because it increases the number of syndromes exponentially thus increasing the decoder complexity exponentially as well. In this paper, a new burst error correcting code based on Hamming codes is proposed which allows much better scaling of decoder complexity as the burst size is increased. For larger burst sizes, it can provide significantly smaller and faster decoders than existing methods thus providing higher reliability at an affordable cost. Moreover, there is frequently no increase in the number of check bits or a very minimal increase in comparison with existing methods. A general construction and decoding methodology for the new codes is proposed. Experimental results are presented comparing the decoder complexity for the proposed codes with conventional burst error correcting Hamming codes demonstrating the significant improvements that can be achieved.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128569276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-Lock: Dense Layout-Level Interconnect Locking using Cross-bar Architectures","authors":"Kaveh Shamsi, Meng Li, D. Pan, Yier Jin","doi":"10.1145/3194554.3194580","DOIUrl":"https://doi.org/10.1145/3194554.3194580","url":null,"abstract":"Logic locking is an attractive defense against a series of hardware security threats. However, oracle guided attacks based on advanced Boolean reasoning engines such as SAT, ATPG and model-checking have made it difficult to securely lock chips with low overhead. While the majority of existing locking schemes focus on gate-level locking, in this paper we present a layout-inclusive interconnect locking scheme based on cross-bars of metal-to-metal programmable-via devices. We demonstrate how this enables configuring a large obfuscation key with a small number of physical key wires contributing to zero to little substrate area overhead. Dense interconnect locking based on these circuit level primitives shows orders of magnitude better SAT attack resiliency compared to an XOR/XNOR gate-insertion locking with the same key length which has a much higher overhead.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133486352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneous Timing Driven Tree Surgery in Routing with Machine Learning-based Acceleration","authors":"Peishan Tu, Chak-Wa Pui, Evangeline F. Y. Young","doi":"10.1145/3194554.3194556","DOIUrl":"https://doi.org/10.1145/3194554.3194556","url":null,"abstract":"In global routing, both timing and routability are critical criterions to measure the performance of a design. However, these two objectives naturally conflict with each other during routing. In this paper, a tree surgery technique is presented to adjust routing tree topologies in global routing to fix timing. We formulate the problem as a quadratic program(QP), which adjusts routing topologies of all the nets from a global perspective and takes congestion into consideration to trade off timing and routability objectives. We also apply machine learning-based techniques to accelerate our algorithm, which offers a fast and effective way to solve the problem. Experimental results on ICCAD~2015 benchmarks show that our algorithms can achieve 10.12% timing improvement with no significant degradation in routability and wirelength. With machine learning-based acceleration (MLA), our results can be obtained in almost negligible runtime.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126822600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-Power Optical Interconnects based on Resonant Silicon Photonic Devices: Recent Advances and Challenges","authors":"M. Bahadori, K. Bergman","doi":"10.1145/3194554.3194606","DOIUrl":"https://doi.org/10.1145/3194554.3194606","url":null,"abstract":"The progressive blooming of silicon photonics technology (SiP) over the last decade has indicated that optical interconnects may substitute the electrical wires for data movement over short distances in the future. A key enabler is the resonant structures that can participate in both modulation and demultiplexing of a high throughput wavelength division multiplexed (WDM) photonic link. The optical and electro-optical properties of such devices are subject to various design considerations, operation conditions, and optimization procedures. We present recent technological advances in photonic links based on resonant structures and highlight the key challenges that must be overcome at a large scale. Furthermore, we discuss how the design space of these resonant devices, down to the geometrical parameters and fabrication errors, can affect the performance and reliability of a photonic link.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123769904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
You Wang, Yue Zhang, Youguang Zhang, Weisheng Zhao, Hao Cai, L. Naviner
{"title":"Design Space Exploration of Magnetic Tunnel Junction based Stochastic Computing in Deep Learning","authors":"You Wang, Yue Zhang, Youguang Zhang, Weisheng Zhao, Hao Cai, L. Naviner","doi":"10.1145/3194554.3194619","DOIUrl":"https://doi.org/10.1145/3194554.3194619","url":null,"abstract":"Magnetic tunnel junction (MTJ) is considered as a promising memory candidate in the more than Moore era because of high power efficiency, fast access speed, nearly infinite endurance and easy 3D integration. The nondeterministic switching behavior has been profited to exploit new directions for computing methods, such as stochastic computing. In this paper, the application of stochastic switching behavior in stochastic computing is explored for deep neural network (DNN). Stochastic computing method features low logic complexity, low energy consumption and fine-grained parallelism, boosting the performance of DNN system by combining MTJ. As a key block of stochastic computing, MTJ based true random number generator design is presented in details. The functionality has been validated by combining the hardware design and post-processing in software. Simulation results are demonstrated visibly by handwritten digits recognition test to show the accuracy. Furthermore, the performance is investigated in terms of accuracy, energy consumption and memory occupation to find more efficient techniques.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127276411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Sha, Hailiang Dong, Weiwen Jiang, Qingfeng Zhuge, Xianzhang Chen, Lei Yang
{"title":"On the Design of Reliable Heterogeneous Systems via Checkpoint Placement and Core Assignment","authors":"E. Sha, Hailiang Dong, Weiwen Jiang, Qingfeng Zhuge, Xianzhang Chen, Lei Yang","doi":"10.1145/3194554.3194642","DOIUrl":"https://doi.org/10.1145/3194554.3194642","url":null,"abstract":"This paper studies two basic problems in the design of high-performance and high-reliability heterogeneous systems: (1) what type of core to execute each task, and (2) where to place checkpoints in the execution of tasks. The implementation of checkpointing techniques on the novel persistent memory (e.g., 3D Xpoint memory) based heterogeneous systems faces a bundle of new problems. First, the assignments of tasks may greatly influence the execution time of the whole application. Therefore, with the same time constraint, the reliability of the resultant system can be significantly affected. Second, creating checkpoints will incur heavy writes on persistent memories and reduce the lifetime of devices. In this paper, we optimally construct reliable systems by assigning tasks to the most suitable cores and placing minimum number of checkpoints in the application, such that the resultant system can satisfy the time constraint in the presence of faults. We devise an efficient dynamic programming algorithm to obtain the optimal assignment and checkpoint placement. Experimental results demonstrate that, compared with existing approaches, our technique can achieve 44% reductions on the number of checkpoints on average.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129739801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Merritt Miller, Carrie Segal, D. McCarthy, Aditya Dalakoti, Prashansa Mukim, F. Brewer
{"title":"Impolite High Speed Interfaces with Asynchronous Pulse Logic","authors":"Merritt Miller, Carrie Segal, D. McCarthy, Aditya Dalakoti, Prashansa Mukim, F. Brewer","doi":"10.1145/3194554.3194592","DOIUrl":"https://doi.org/10.1145/3194554.3194592","url":null,"abstract":"We present a design solution that allows design of higher-than-core rate operation with techniques that avoid PLL/DLL blocks to provide higher speed timing. Many modern integrated circuits (ICs) have high speed interfaces which operate at higher cycle rates than the core of the IC. As a result of the higher-than-core rate, these interfaces are not directly representable in the core sequential logic. Asynchronous pulse logic offers an alternative design method for high speed interfaces with similar performance, simpler circuitry and without resorting to high-power logic cells such as emitter coupled logic. Formal and practical considerations for constructing high-speed interfaces are described. Gate designs and timing information for example cases are presented. These cases suggest that 80% improvements on rate compared traditional clocked logic are possible.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126345248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Shkurko, Tim Grant, E. Brunvand, Daniel Kopta, J. Spjut, Elena Vasiou, Ian Mallett, Cem Yuksel
{"title":"SimTRaX","authors":"K. Shkurko, Tim Grant, E. Brunvand, Daniel Kopta, J. Spjut, Elena Vasiou, Ian Mallett, Cem Yuksel","doi":"10.1145/3194554.3194650","DOIUrl":"https://doi.org/10.1145/3194554.3194650","url":null,"abstract":"SimTRaX is a simulation infrastructure for simultaneous exploration of highly parallel accelerator architectures and how applications map to them. The infrastructure targets both cycle-accurate and functional simulation of architectures with thousands of simple cores that may share expensive computation and memory resources. A modified LLVM backend used to compile C++ programs for the simulated architecture allows the user to create custom instructions that access proposed special-purpose hardware and to debug and profile the applications being executed. The simulator models a full memory hierarchy including registers, local scratchpad RAM, shared caches, external memory channels, and DRAM main memory, leveraging the USIMM DRAM simulator to provide accurate dynamic latencies and power usage. SimTRaX provides a powerful and flexible infrastructure for exploring a class of extremely parallel architectures for parallel applications that are not easily simulated using existing simulators.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117333329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}