{"title":"Clock Skew Scheduling in the Presence of Heavily Gated Clock Networks","authors":"Weicheng Liu, E. Salman, Can Sitik, B. Taskin","doi":"10.1145/2742060.2742092","DOIUrl":"https://doi.org/10.1145/2742060.2742092","url":null,"abstract":"Clock skew scheduling is a common and well known technique to improve the performance of sequential circuits by exploiting the mismatches in the data path delays. Existing clock skew scheduling techniques, however, cannot effectively consider heavily gated clock networks where a local clock tree exists between clock gating cells and registers. A methodology is proposed in this paper to efficiently achieve clock skew scheduling in circuits with gated clock networks. The methodology is implemented via both linear programming and constraint graph based approaches, and evaluated using the largest ISCAS'89 benchmark circuits with clock gating. The results demonstrate up to approximately 21% reduction in clock period while maintaining the power savings achieved by clock gating. A conventional design flow is used for the experiments, demonstrating the applicability of the proposed algorithms to automation.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125091809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparative Review and Evaluation of Approximate Adders","authors":"Honglan Jiang, Jie Han, F. Lombardi","doi":"10.1145/2742060.2743760","DOIUrl":"https://doi.org/10.1145/2742060.2743760","url":null,"abstract":"As an important arithmetic module, the adder plays a key role in determining the speed and power consumption of a digital signal processing (DSP) system. The demands of high speed and power efficiency as well as the fault tolerance nature of some applications have promoted the development of approximate adders. This paper reviews current approximate adder designs and provides a comparative evaluation in terms of both error and circuit characteristics. Simulation results show that the equal segmentation adder (ESA) is the most hardware-efficient design, but it has the lowest accuracy in terms of error rate (ER) and mean relative error distance (MRED). The error-tolerant adder type II (ETAII), the speculative carry select adder (SCSA) and the accuracy-configurable approximate adder (ACAA) are equally accurate (provided that the same parameters are used), however ETATII incurs the lowest power-delay-product (PDP) among them. The almost correct adder (ACA) is the most power consuming scheme with a moderate accuracy. The lower-part-OR adder (LOA) is the slowest, but it is highly efficient in power dissipation.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122942015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Reniwal, V. Vijayvargiya, Pooran Singh, S. Vishvakarma, D. Dwivedi
{"title":"Dataline Isolated Differential Current Feed/Mode Sense Amplifier for Small Icell SRAM Using FinFET","authors":"B. Reniwal, V. Vijayvargiya, Pooran Singh, S. Vishvakarma, D. Dwivedi","doi":"10.1145/2742060.2742104","DOIUrl":"https://doi.org/10.1145/2742060.2742104","url":null,"abstract":"This paper for the first time presents a novel, high-performance and robust current feed sense amplifiers (CF-SA) design for small ICell SRAM in 20nm Fin-shaped field effect transistor (FinFET) technology. The CFSA incorporates isolated DL current sensing approach which provides the higher Current Ratio Amplification (CRA) factor. The CF-SA significantly outperforms with 66.89% and 31.47% lower sensing delay than CCSA [13] and HSA [8] respectively under similar ICell and bit-line and data-line capacitance. Our results show that even at the worst corner the CF-SA demonstrates 2.15x and 3.02x higher differential current and 2.23x and 1.7x higher data-line differential voltage with 66.6% and 34.32% higher mean (μ) than those of the best prior arts. Furthermore, failure probability of the proposed design against process parameter variations is rigorously analyzed through Monte Carlo simulations.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122741932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Test Application for Rapid Multi-Temperature Testing","authors":"Nima Aghaee, Zebo Peng, P. Eles","doi":"10.1145/2742060.2742064","DOIUrl":"https://doi.org/10.1145/2742060.2742064","url":null,"abstract":"Different defects may manifest themselves at different temperatures. Therefore, the tests that target such temperature-dependent defects must be applied at different temperatures appropriate for detecting them. Such multi-temperature testing scheme applies tests at different required temperatures. It is known that a test's power dissipation depends on the previously applied test. Therefore, the same set of tests when organized differently dissipates different amounts of power. The technique proposed in this paper organizes the tests efficiently so that the resulted power levels lead to the required temperatures. Consequently a rapid multi-temperature testing is achieved. Experimental studies demonstrate the efficiency of the proposed technique.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122485107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Ternary Content Addressable Cell Using a Single Phase Change Memory (PCM)","authors":"P. Junsangsri, F. Lombardi, Jie Han","doi":"10.1145/2742060.2742062","DOIUrl":"https://doi.org/10.1145/2742060.2742062","url":null,"abstract":"This paper presents the novel design of a Ternary Content Addressable Memory (TCAM); different from existing designs found in the technical literature, this cell utilizes a single Phase Change Memory (PCM) as storage element and ambipolarity for comparison. A memory core consisting of a CMOS transistor and a PCM is employed (1T1P); for the search operation, the data in the 1T1P memory core is read and its value is established using two differential sense amplifiers. Compared with other non-volatile memory cells using emerging technologies (such as PCM-based, and memristor-based), simulation results show that the proposed non-volatile TCAM cell offer significant advantages in terms of power dissipation, PDP for the search operation, write time and reduced circuit complexity (in terms of lower counts in transistors and storage elements).","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124233226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Beiye Liu, W. Wen, Yiran Chen, Xin Li, Chi-Ruo Wu, Tsung-Yi Ho
{"title":"EDA Challenges for Memristor-Crossbar based Neuromorphic Computing","authors":"Beiye Liu, W. Wen, Yiran Chen, Xin Li, Chi-Ruo Wu, Tsung-Yi Ho","doi":"10.1145/2742060.2743754","DOIUrl":"https://doi.org/10.1145/2742060.2743754","url":null,"abstract":"The increasing gap between the high data processing capability of modern computing systems and the limited memory bandwidth motivated the recent significant research on neuromorphic computing systems (NCS), which are inspired from the working mechanism of human brains. Discovery of memristor further accelerates engineering realization of NCS by leveraging the similarity between synaptic connections in neural networks and programming weight of the memristor. However, to achieve a stable large-scale NCS for practical applications, many essential EDA design challenges still need to be overcome especially the state-of-the-art memristor crossbar structure is adopted. In this paper, we summarize some of our recent published works about enhancing the design robustness and efficiency of memristor crossbar based NCS. The experiments show that the impacts of noises generated by process variations and the IR-drop over the crossbar can be effectively suppressed by our noise-eliminating training method and IR-drop compensation technique. Moreover, our network clustering techniques can alleviate the challenges of limited crossbar scale and routing congestion in NCS implementations.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117161976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Interconnects and NoCs","authors":"James W. Stine","doi":"10.1145/3254017","DOIUrl":"https://doi.org/10.1145/3254017","url":null,"abstract":"","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"37 13","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120816424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaojiao Ou, Bei Yu, Jhih-Rong Gao, D. Pan, M. Preil, A. Latypov
{"title":"Directed Self-Assembly Based Cut Mask Optimization for Unidirectional Design","authors":"Jiaojiao Ou, Bei Yu, Jhih-Rong Gao, D. Pan, M. Preil, A. Latypov","doi":"10.1145/2742060.2742114","DOIUrl":"https://doi.org/10.1145/2742060.2742114","url":null,"abstract":"Unidirectional design has attracted lots of attention with the scaling down of technology nodes. However, due to the limitation of traditional lithography, printing the randomly distributed dense cuts becomes a big challenge for highly scaled unidirectional layout. Recently directed self-assembly (DSA) has emerged as a promising lithography technique candidate for cut manufacturing because of its ability to form small cylinders inside the guiding templates and the actual pattern size can be greatly reduced. In this paper, we perform a comprehensive study on the DSA cut mask optimization problem. We first formulate it as integer linear programming (ILP) to assign cuts to different guiding templates, targeting at minimum conflicts and line-end extensions. As ILP may not be scalable for very large size problem, we further propose a speed-up method to decompose the problem into smaller ones and solve them separately. We then merge and legalize the solutions without much loss of result quality. The proposed approaches can be easily extended to handle more DSA guiding patterns with complicated shapes. Experimental results show that our methods can significantly reduce the total number of unresolvable patterns and the line-end extensions for the targeted layouts.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115915568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fine-Grained Voltage Boosting for Improving Yield in Near-Threshold Many-Core Processors","authors":"J. Kong, Arslan Munir, F. Koushanfar","doi":"10.1145/2742060.2742105","DOIUrl":"https://doi.org/10.1145/2742060.2742105","url":null,"abstract":"Process variation is a major impediment in optimizing yield, energy, and performance in near-threshold many-core processors. In this paper, we present a comprehensive analysis on yield losses in near-threshold many-core processors. Based on our analysis, we propose energy-efficient yield improvement techniques for near-threshold many-core processors: SRAM cell arrays and Wordline driver voltage Boosting (SWBoost) and Cache voltage Boosting (CBoost). Results reveal that SWBoost and CBoost improve a chip yield by up to 66% and 83%, respectively. Furthermore, runtime energy overheads of SWBoost and CBoost are only 0.46% and 0.54%, respectively, which are much lower than conventional voltage boosting techniques.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"325 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115871294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Wang, Tianqi Tang, Lixue Xia, Boxun Li, P. Gu, Huazhong Yang, Hai Helen Li, Yuan Xie
{"title":"Energy Efficient RRAM Spiking Neural Network for Real Time Classification","authors":"Yu Wang, Tianqi Tang, Lixue Xia, Boxun Li, P. Gu, Huazhong Yang, Hai Helen Li, Yuan Xie","doi":"10.1145/2742060.2743756","DOIUrl":"https://doi.org/10.1145/2742060.2743756","url":null,"abstract":"Inspired by the human brain's function and efficiency, neuromorphic computing offers a promising solution for a wide set of tasks, ranging from brain machine interfaces to real-time classification. The spiking neural network (SNN), which encodes and processes information with bionic spikes, is an emerging neuromorphic model with great potential to drastically promote the performance and efficiency of computing systems. However, an energy efficient hardware implementation and the difficulty of training the model significantly limit the application of the spiking neural network. In this work, we address these issues by building an SNN-based energy efficient system for real time classification with metal-oxide resistive switching random-access memory (RRAM) devices. We implement different training algorithms of SNN, including Spiking Time Dependent Plasticity (STDP) and Neural Sampling method. Our RRAM SNN systems for these two training algorithms show good power efficiency and recognition performance on realtime classification tasks, such as the MNIST digit recognition. Finally, we propose a possible direction to further improve the classification accuracy by boosting multiple SNNs.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114780351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}