{"title":"AB-Aware: Application Behavior Aware Management of Shared Last Level Caches","authors":"S. Pai, Newton Singh, Virendra Singh","doi":"10.1145/3194554.3194573","DOIUrl":"https://doi.org/10.1145/3194554.3194573","url":null,"abstract":"In modern multicore systems, Last-Level Cache (LLC) is usually shared among multiple cores. Though it benefits applications by sharing and utilizing cache resources efficiently; the benefits come at the cost of increased conflict misses due to interference among applications. In shared LLC, conventionally used LRU-based cache replacement policies logically partition the cache on-demand basis. Thus, cache friendly applications sharing LLC with streaming applications, suffer due to high data demands and low reuse of streaming applications. Apart from different data locality behavior, applications also show different memory access behavior while accessing the LLC. Some applications inherently have parallel memory accesses while others have more isolated long-latency accesses. The cost of idle cycles processor spends waiting for off-chip memory accesses is shared by parallel misses. However, misses which occur in isolation hurt the performance most. This adds another dimension to application's behavior. We propose an application behavior aware cache replacement policy to manage shared LLC. The proposed policy simultaneously reduces the negative interference among applications sharing the LLC and the miss-penalty associated with each LLC miss. Evaluation on SPEC CPU2006 benchmarks shows that our replacement policy improves performance on dual-core systems and quad-core system by up to 15.9% and 23.8% respectively over SRRIP for shared LLC. It is worth to note that effectiveness of our policy improves with the increase in the number of cores.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131633575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Cross-Layer Perspective for Energy Efficient Processing: - From beyond-CMOS Devices to Deep Learning","authors":"X. Hu","doi":"10.1145/3194554.3200204","DOIUrl":"https://doi.org/10.1145/3194554.3200204","url":null,"abstract":"As Moore's Law based device scaling and accompanying performance scaling trends are slowing down, there is increasing interest in new technologies and computational models for fast and more energy-efficient information processing. Meanwhile, there is growing evidence that, with respect to traditional Boolean circuits and von Neumann processors, it will be challenging for beyond-CMOS devices to compete with the CMOS technology. Nevertheless, some beyond-CMOS devices demonstrate other unique characteristics such as ambipolarity, negative differential resistance, hysteresis, and oscillatory behavior. Exploiting such unique characteristics, especially in the context of alternative circuit and architectural paradigms, has the potential to offer orders of magnitude improvement in terms of power, performance and capability. In order to take full advantage of beyond-CMOS devices, however, it is no longer sufficient to develop algorithms, architectures and circuits independent of one another. Cross-layer efforts spanning from devices to circuits to architectures to algorithms are indispensable. This talk will examine energy-efficient neural network accelerators for embedded applications in this context. Several deep neural network accelerator designs based on alternative device technologies, circuit styles and architectures will be highlighted. A comprehensive application-level benchmarking study for the MNIST dataset will be presented. The discussions will demonstrate that cross-layer efforts indeed can lead to orders of magnitude gain towards achieving extreme scale energy-efficient processing.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131671618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tanmay Shinde, Suryanarayanan Subramaniam, Padmanabh Deshmukh, M. Ahmed, Mark A. Indovina, A. Ganguly
{"title":"A 0.24pJ/bit, 16Gbps OOK Transmitter Circuit in 45-nm CMOS for Inter and Intra-Chip Wireless Interconnects","authors":"Tanmay Shinde, Suryanarayanan Subramaniam, Padmanabh Deshmukh, M. Ahmed, Mark A. Indovina, A. Ganguly","doi":"10.1145/3194554.3194575","DOIUrl":"https://doi.org/10.1145/3194554.3194575","url":null,"abstract":"Research in recent years has demonstrated that intra and inter-chip wireless interconnects are capable of establishing energy-efficient data communications within as well as between multiple chips. This paper presents a circuit level design of an energy-efficient millimeter wave (mm-wave) on-off keying (OOK) transmitter suitable for such wireless interconnects in 45-nm CMOS process. The transmitter consists of an NMOS cross-coupled VCO, an OOK modulator and a power amplifier. The transmitter is able to achieve maximum modulation data rate of 16Gb/s at 60GHz with the output power of -3dBm consuming a total power of 3.9mW, which translates to a bit-energy efficiency of 0.24pJ/bit.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131935004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Short-path Padding Method for Timing Error Resilient Circuits based on Transmission Gates Insertion","authors":"Wentao Dai, Peiye Liu, Weiwei Shan","doi":"10.1145/3194554.3194600","DOIUrl":"https://doi.org/10.1145/3194554.3194600","url":null,"abstract":"Resilient circuits based on timing error detection and correction can mitigate the timing margin effectively, but usually at a cost of extra area overhead. One of the major sources of area overhead is short-path padding (hold time fix), which is much severer than in traditional IC design for near-threshold operation. Therefore, we propose an insertion methodology by using transmission gates to extend short-paths, which decreases area overhead than traditional resilient methods. Because the clock-controlled transmission gate (CTG) can extend all the short paths by half a clock when working as a transparent-low latch, the short-paths problem is solved. Besides, as the transmission gates synchronize the multiple short paths, it decreases the invalid flipping of combinational logic, which reduces the glitch power. Applied on a SHA-256 algorithm circuit in a 28nm CMOS process with 0.55V supply, the proposed technique reduces the area overhead a lot compared to the conventional short-path padding techniques. For combinational circuit, its area reduces from 153.34% to 4.43%, and for sequential circuit area, it reduces from 124.33% to 19.33%.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132325228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Electromigration Design Rule aware Global and Detailed Routing Algorithm","authors":"Xiaotao Jia, Jing Wang, Yici Cai, Qiang Zhou","doi":"10.1145/3194554.3194567","DOIUrl":"https://doi.org/10.1145/3194554.3194567","url":null,"abstract":"Electromigration (EM) in interconnects is becoming a major concern as the scaling of technology nodes. Electromigration affects chip performance and signal integrity seriously by generating shorts or opens, and then shortens the life-time of integrated circuits. In this paper, we propose an EM-aware routing algorithm in both global and detailed routing stages. Based on physics-based EM modeling and analysis, EM issue is modeled as physical design rule. In global routing stage, an efficient EM-aware Mazerouting algorithm is implemented. An concurrent EM-aware detailed router is then proposed based on multi-commodity flow method. Experimental results show that comparing with general routing algorithm, the proposed EM-aware algorithm could effectively reduce EM risk of signal wires by 92% with slight increasing of wire length and via count.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"2018 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128624902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Fault-Tolerant Last-Level Cache to Improve Reliability at Near-Threshold Voltage","authors":"W. Liu, Zhigang Wei, Wei Du","doi":"10.1145/3194554.3194583","DOIUrl":"https://doi.org/10.1145/3194554.3194583","url":null,"abstract":"Near-threshold voltage computing (NTC) improves power and energy efficiency of cache by scaling transistor voltage. However, in large SRAM structures, such as last-level cache (LLC), a great number of bit-cell errors will occur when supply voltage scales to near-threshold voltage. In this paper, we propose a novel fault-tolerant LLC design (NFTLLC) to deal with a high failure rate which is higher than 1% at near-threshold voltage. NFTLLC corrects the single-error and compresses multi-error in Cache entry to improves the reliability of last-level cache. To validate the efficiency of NFTLLC, we implement NFTLLC and prior works in gem5, and simulate with SPEC CPU2006. The experiment shows that compared with Concertina when bit-cell failure rate is 1.1%, the performance of NFTLLC with 4-byte subblock size improves by 6.8% and the Cache capacity increases by 20.8%. Besides, miss rate decreases more than 53%, and overhead increases by 16.8% in minimum.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124294400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DARPA's Data Driven Discovery of Models (D3M) and Software Defined Hardware (SDH) Programs","authors":"Wade Shen","doi":"10.1145/3194554.3200206","DOIUrl":"https://doi.org/10.1145/3194554.3200206","url":null,"abstract":"Mr. Shen joined DARPA from the Massachusetts Institute of Technology, Lincoln Laboratory where he was an associate group leader in the Human Language Technology Group. Mr. Shen’s area of research involved machine translation; speech; speaker and language recognition; information extraction and prosodic modeling for both smalland large-scale applications. Prior to joining MIT Lincoln Laboratory, Wade helped found and served as chief technology officer for Vocentric Corporation, a company specializing in speech technologies for resource-constrained and embedded applications.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131300882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Attaran, T. Sheaves, Praveen Kumar Mugula, H. Mahmoodi
{"title":"Static Design of Spin Transfer Torques Magnetic Look Up Tables for ASIC Designs","authors":"A. Attaran, T. Sheaves, Praveen Kumar Mugula, H. Mahmoodi","doi":"10.1145/3194554.3194651","DOIUrl":"https://doi.org/10.1145/3194554.3194651","url":null,"abstract":"In this paper, we propose a static approach to the design of Spin Transfer Torque Look Up Tables (STT-LUT) for integration in ASIC and investigate the sensing reliability in the proposed design in detail. The proposed design style utilizes STT-Latches that their sensing reliability is key in determining the overall reliability of the proposed static STT-LUT. The simulation results in a 10nm FinFET CMOS technology shows that the proposed static STT-LUT design exhibits up to 26% read delay reduction compared to the best dynamic STT-LUT design, and more than 2.5X reduction in sensing failure rate.","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130464350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Special Session 1: Powering Heterogeneous IoT Systems: Design for Efficiency, Security and Sustainability","authors":"S. Kose, Inna Partin-Vaisband","doi":"10.1145/3252914","DOIUrl":"https://doi.org/10.1145/3252914","url":null,"abstract":"","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124664688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Special Session 4: Implementing and Benchmarking Post-Quantum Cryptography in Hardware","authors":"K. Gaj","doi":"10.1145/3252917","DOIUrl":"https://doi.org/10.1145/3252917","url":null,"abstract":"","PeriodicalId":215940,"journal":{"name":"Proceedings of the 2018 on Great Lakes Symposium on VLSI","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133371091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}