{"title":"Bandwidth Adaptive Nanophotonic Crossbars with Clockwise/Counter-clockwise Optical Routing","authors":"M. Kennedy, Avinash Karanth Kodi","doi":"10.1109/VLSID.2015.26","DOIUrl":"https://doi.org/10.1109/VLSID.2015.26","url":null,"abstract":"Future processors are anticipated to have hundreds or even thousands of processing cores placed entirely on a single silicon chip. The increasing number of cores placed on a single chip presents new challenges, pushing researchers to explore opportunities in emerging technologies such as on-chip silicon nanophotonics. Implications of nanophotonic technology has created a unique landscape for new interconnect designs. Among the many architectures made possible by nanophotonics, there has been notable interest in crossbar topologies that were previously impractical using only electrical components. In this paper, we present a new nanophotonic crossbar interconnect architecture with the aim of retaining the low latency, single-hop characteristic of the crossbar topology, while also improving the networks utility of the static laser source which is often wasted to insertion losses and unused bandwidth. We compare our architecture design to other proposed architectures according to area, power consumption, throughput, and latency. Approximately a 13% improvement in throughput is achieved compared to other optical crossbar topologies and a 92% improvement is achieved compared to a conventional electrical flattened butterfly topology on synthetic traffic patterns.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126331675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Debjyoti Bhattacharjee, A. Banerjee, A. Chattopadhyay
{"title":"EvoDeb: Debugging Evolving Hardware Designs","authors":"Debjyoti Bhattacharjee, A. Banerjee, A. Chattopadhyay","doi":"10.1109/VLSID.2015.87","DOIUrl":"https://doi.org/10.1109/VLSID.2015.87","url":null,"abstract":"Increasing design complexity, skyrocketing fabrication costs for modern digital systems coupled with an unacceptably large number of silicon respins led to growing importance of comprehensive and automated design verification. Akin to software configuration management, it is becoming commonplace to maintain large hardware design code-bases with hardware configuration management tools. A missing piece of crucial technology in this approach is to manage design verification across evolving hardware designs. In this paper, we propose an efficient methodology for automatically localizing design errors across design versions. The proposed technique, Evo Deb, can be easily integrated into a hardware configuration management framework and is scalable for large designs. We demonstrate the efficacy of Evo Deb on a couple of bugs on open-source hardware designs across multiple evolving variants.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122070352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farhad Merchant, Arka Maity, Mahesh Mahadurkar, Kapil Vatwani, Ishan Munje, C. MadhavaKrishna, N. Sivanandan, N. Gopalan, S. Raha, S. Nandy, R. Narayan
{"title":"Micro-architectural Enhancements in Distributed Memory CGRAs for LU and QR Factorizations","authors":"Farhad Merchant, Arka Maity, Mahesh Mahadurkar, Kapil Vatwani, Ishan Munje, C. MadhavaKrishna, N. Sivanandan, N. Gopalan, S. Raha, S. Nandy, R. Narayan","doi":"10.1109/VLSID.2015.31","DOIUrl":"https://doi.org/10.1109/VLSID.2015.31","url":null,"abstract":"LU and QR factorizations are the computationally dear part of many applications ranging from large scale simulations (e.g. Computational fluid dynamics) to augmented reality. These factorizations exhibit time complexity of O (n3) and are difficult to accelerate due to presence of bandwidth bound kernels, BLAS-1 or BLAS-2 (level-1 or level-2 Basic Linear Algebra Subprograms) along with compute bound kernels (BLAS-3, level-3 BLAS). On the other hand, Coarse Grained Reconfigurable Architectures (CGRAs) have gained tremendous popularity as accelerators in embedded systems due to their flexibility and ease of use. Provisioning these accelerators in High Performance Computing (HPC) platforms is the research challenge wrestled by the computer scientists. We consider a CGRA environment in which several Compute Elements (CEs) enhanced with Custom Functional Units (CFUs) are interconnected over a Network-on-Chip (NoC). In this paper, we carry out extensive micro-architectural exploration for accelerating core kernels like Matrix Multiplication (MM) (BLAS-3) for LU and QR factorizations. Our 5 different design enhancements lead to the reduction in the latency of BLAS-3 kernels. On a stand-alone CFU, we achieve up to 8x speed-up for MM. A commensurate improvement is observed for MM in a CGRA environment. We achieve better GF LOP S/mm2 compared to recent implementations.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129598869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Mane, Nishil Talati, Ameya Riswadkar, Bhavan Jasani, C. K. Ramesha
{"title":"Implementation of NOR Logic Based on Material Implication on CMOL FPGA Architecture","authors":"P. Mane, Nishil Talati, Ameya Riswadkar, Bhavan Jasani, C. K. Ramesha","doi":"10.1109/VLSID.2015.94","DOIUrl":"https://doi.org/10.1109/VLSID.2015.94","url":null,"abstract":"Memristor based nanocrossbar layer fabricated on CMOS layer has shown tremendous potential as high density memory and in reconfigurable logic architectures. Instead of having predesigned Configurable Logic Blocks (CLBs) and memory for reconfiguration as in FPGA, they can be instantiated in nanocrossbar memory as the need arises. We have shown in this paper, the novel design of NOR block as basic unit of computation for in-memory calculations to implement on CMOL FPGA architecture. This block implements its function using material implication. The proposed scheme is against the naturally arising boolean logic based NOR block in CMOL FPGA.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128287569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sunil Dutt, Anshu Chauhan, Rahul Bhadoriya, Sukumar Nandi, G. Trivedi
{"title":"A High-Performance Energy-Efficient Hybrid Redundant MAC for Error-Resilient Applications","authors":"Sunil Dutt, Anshu Chauhan, Rahul Bhadoriya, Sukumar Nandi, G. Trivedi","doi":"10.1109/VLSID.2015.65","DOIUrl":"https://doi.org/10.1109/VLSID.2015.65","url":null,"abstract":"In the majority of Digital Signal Processing (DSP) applications, such as image, audio and video processing, the final result is interpreted by human senses, and, the fact of confined perception of human senses declines the strict restriction on accuracy. Thus, by adopting the emerging concept of approximate computing, we propose an approximate radix-2 hybrid redundant Multiply-and-Accumulate (Approx MAC) unit which stems a novel Speed-Power-Accuracy-Area (SPAA) metrics. The Approx MAC unit attains tremendous improvements in computational performance, energy efficiency and silicon area with a trivial degradation in the output quality. To inspect the effectiveness of the proposed approach in real-time DSP applications, we demonstrate an Approx MAC unit embedded JPEG-E-X IP core architecture. The Approx MAC unit with 40 approximate LSBs ensures 7.177x and 1.526x speedup, 1.594x and 4.163x energy efficiency, and 1.131x and 1.277x silicon area improvements over binary and hybrid redundant MAC units, respectively. Moreover, the Approx MAC unit with 40 approximate LSBs decorates power precision and delay-precision metrics by 14.71% and 32.95%, respectively.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116167415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Way Halted Prediction Cache: An Energy Efficient Cache Architecture for Embedded Processors","authors":"Neethu Bal Mallya, Geeta Patil, B. Raveendran","doi":"10.1109/VLSID.2015.16","DOIUrl":"https://doi.org/10.1109/VLSID.2015.16","url":null,"abstract":"This paper proposes a novel cache architecture -- Way Halted Prediction -- to reduce energy consumption and effective access time of set associative caches. This is achieved with the help of halt tag array and prediction circuit. Experimental evaluation of various SPEC benchmark programs on CACTI 5.3 and CASIM simulators reveal that the proposed architecture offers 33%, 6% and 3% savings in dynamic energy consumption and 1.80%, 6.13% and -1.95% saving in effective access time over conventional, way predicting and way halting cache architectures respectively.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130814919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Boolean Tests to Improve Detection of Transistor Stuck-Open Faults in CMOS Digital Logic Circuits","authors":"X. Lin, S. Reddy, J. Rajski","doi":"10.1109/VLSID.2015.73","DOIUrl":"https://doi.org/10.1109/VLSID.2015.73","url":null,"abstract":"Currently transistor stuck-open (TSOP) faults in CMOS digital logic circuits are detected by two pattern tests consisting of an initialization pattern to set the output of a faulty gate followed by a pattern that detects a stuck-at fault. Some TSOP faults may not be detected by such two-pattern tests. One reason for this is that appropriate initialization patterns cannot be obtained using Boolean (steady state) analysis of the circuit. For some of these faults, required initialization may be possible using hazards (glitches) [10][13]. However, insuring that a test using hazard-based initialization actually detects the target fault requires accurate transient analysis of the circuit under test such as by SPICE. In this work we propose methods to augment test generation procedures to detect TSOP faults using traditional steady state Boolean analysis (called Boolean tests in this work). We also investigate the cause for the non-existence of test patterns for the faults not detected in benchmark circuits. In many such cases we found that the non-existence of test patterns is due to redundant gates that can be replaced by a constant 1 or 0. We present results on larger ISCAS-89 benchmark circuits to illustrate the effectiveness of the proposed methods to generate tests to detect TSOP faults and the results of analysis for the non-existence of tests for the remaining faults undetected by Boolean tests.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121265023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA Based Scalable Fixed Point QRD Core Using Dynamic Partial Reconfiguration","authors":"G. Prabhu, Bibin Johnson, J. S. Rani","doi":"10.1109/VLSID.2015.64","DOIUrl":"https://doi.org/10.1109/VLSID.2015.64","url":null,"abstract":"This work presents an FPGA based scalable fixed point QRD architecture based on Givens Rotation algorithm.The proposed QRD core utilizes an efficient pipelined and unfolded 2D MAC based systolic array architecture with dynamic partial reconfiguration(DPR) capability. An improved LUT based Newton-Raphson method is proposed for finding square root and inverse square root which helps in reducing the area by 71% and latency by 50%, while operating at a frequency 49% higher than the existing boundary cell architectures. The scalability of the QRD core is achieved using DPR which results in reduction in dynamic power and area utilization as compared to a static implementation. The proposed architecture is implemented on Xilinx Virtex-6 FPGA for any real matrices of size m × n where, 4 ≤ n ≤ 8 and m ≥ n by dynamically inserting or removing the partial modules. The evaluation results shows reduction in latency, area and power as compared to CORDIC based architectures. The proposed scalable QRD core is used for implementing a high performance adaptive equalizer(QRD-RLS Algorithm) used in mobile receiver's and the evaluation is done by transmitting BPSK symbols in the training mode.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121274200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Ternary Content-Addressable Memory (TCAM) Design Using Reversible Logic","authors":"S. D. Kumar, S. Mahammad","doi":"10.1109/VLSID.2015.99","DOIUrl":"https://doi.org/10.1109/VLSID.2015.99","url":null,"abstract":"Content addressable memory is a special type of memory which can do search operation in a single clock cycle. CAM has disadvantages of high power dissipation during the matching operation. Ternary content addressable memory (TCAM) is a special type of memory which is used to search for logic 0, logic 1, logic 'x'. These types of memory are used in routers in order to perform the lookup table function in a single clock cycle. As the use of networks, typified by the Internet, has spread widely in recent years, attention has focused on TCAMs as a key device for increasing the speed of packet forwarding (packet data transfers) by networking equipment by enabling high-speed lookup of destinations, etc., for large volumes of information during packet data transfers. Reversible logic has gained its interest in recent years due to its ultra low power characteristics. Many works have been done to reduce the power consumption in TCAM. This paper deals with a novel design of TCAM cells using reversible logic. The proposed design is optimized in terms of number of garbage outputs and quantum cost. The proposed TCAM cell does the function of the conventional TCAM cell.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114508893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Framework for Selective Flip-Flop Replacement for Soft Error Mitigation","authors":"Pavan Vithal Torvi, V. Devanathan, V. Kamakoti","doi":"10.1109/VLSID.2015.70","DOIUrl":"https://doi.org/10.1109/VLSID.2015.70","url":null,"abstract":"With increasing adoption of newer technologies and architectures targeted for automotive and aviation electronics with an objective to improve performance and/or reduce power/area, soft-error robustness is becoming an important issue to ensure reliable operation for an extended lifetime over a wide range of operating conditions. In this paper, we propose a modeling and optimization framework to systematically improve the FIT (failure-in-time) rate of a design with minimal impact on power, performance and area. We first propose a framework to model and evaluate the relative vulnerability to soft errors of the standard master-slave flip-flops and Dual Interlocked Storage Cells (DICE) in the cell library. Later, we formulate a linear optimization problem using this information to selectively replace the flip-flops so as to improve the FIT rate of the design with minimal impact on area and power. Employing the proposed technique on a popular industrial IP core shows a 32% relative improvement in the design robustness with just 2% increase in design area.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125427472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}