2015 28th International Conference on VLSI Design最新文献_第8页

Bandwidth Adaptive Nanophotonic Crossbars with Clockwise/Counter-clockwise Optical Routing 顺/逆时针光路由的带宽自适应纳米光子交叉棒

2015 28th International Conference on VLSI Design Pub Date : 2015-01-01 DOI: 10.1109/VLSID.2015.26

M. Kennedy, Avinash Karanth Kodi

{"title":"Bandwidth Adaptive Nanophotonic Crossbars with Clockwise/Counter-clockwise Optical Routing","authors":"M. Kennedy, Avinash Karanth Kodi","doi":"10.1109/VLSID.2015.26","DOIUrl":"https://doi.org/10.1109/VLSID.2015.26","url":null,"abstract":"Future processors are anticipated to have hundreds or even thousands of processing cores placed entirely on a single silicon chip. The increasing number of cores placed on a single chip presents new challenges, pushing researchers to explore opportunities in emerging technologies such as on-chip silicon nanophotonics. Implications of nanophotonic technology has created a unique landscape for new interconnect designs. Among the many architectures made possible by nanophotonics, there has been notable interest in crossbar topologies that were previously impractical using only electrical components. In this paper, we present a new nanophotonic crossbar interconnect architecture with the aim of retaining the low latency, single-hop characteristic of the crossbar topology, while also improving the networks utility of the static laser source which is often wasted to insertion losses and unused bandwidth. We compare our architecture design to other proposed architectures according to area, power consumption, throughput, and latency. Approximately a 13% improvement in throughput is achieved compared to other optical crossbar topologies and a 92% improvement is achieved compared to a conventional electrical flattened butterfly topology on synthetic traffic patterns.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126331675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

EvoDeb: Debugging Evolving Hardware Designs EvoDeb:调试不断发展的硬件设计

2015 28th International Conference on VLSI Design Pub Date : 2015-01-01 DOI: 10.1109/VLSID.2015.87

Debjyoti Bhattacharjee, A. Banerjee, A. Chattopadhyay

引用次数: 5

Micro-architectural Enhancements in Distributed Memory CGRAs for LU and QR Factorizations 面向LU和QR分解的分布式内存CGRAs中的微体系结构增强

2015 28th International Conference on VLSI Design Pub Date : 1900-01-01 DOI: 10.1109/VLSID.2015.31

Farhad Merchant, Arka Maity, Mahesh Mahadurkar, Kapil Vatwani, Ishan Munje, C. MadhavaKrishna, N. Sivanandan, N. Gopalan, S. Raha, S. Nandy, R. Narayan

{"title":"Micro-architectural Enhancements in Distributed Memory CGRAs for LU and QR Factorizations","authors":"Farhad Merchant, Arka Maity, Mahesh Mahadurkar, Kapil Vatwani, Ishan Munje, C. MadhavaKrishna, N. Sivanandan, N. Gopalan, S. Raha, S. Nandy, R. Narayan","doi":"10.1109/VLSID.2015.31","DOIUrl":"https://doi.org/10.1109/VLSID.2015.31","url":null,"abstract":"LU and QR factorizations are the computationally dear part of many applications ranging from large scale simulations (e.g. Computational fluid dynamics) to augmented reality. These factorizations exhibit time complexity of O (n3) and are difficult to accelerate due to presence of bandwidth bound kernels, BLAS-1 or BLAS-2 (level-1 or level-2 Basic Linear Algebra Subprograms) along with compute bound kernels (BLAS-3, level-3 BLAS). On the other hand, Coarse Grained Reconfigurable Architectures (CGRAs) have gained tremendous popularity as accelerators in embedded systems due to their flexibility and ease of use. Provisioning these accelerators in High Performance Computing (HPC) platforms is the research challenge wrestled by the computer scientists. We consider a CGRA environment in which several Compute Elements (CEs) enhanced with Custom Functional Units (CFUs) are interconnected over a Network-on-Chip (NoC). In this paper, we carry out extensive micro-architectural exploration for accelerating core kernels like Matrix Multiplication (MM) (BLAS-3) for LU and QR factorizations. Our 5 different design enhancements lead to the reduction in the latency of BLAS-3 kernels. On a stand-alone CFU, we achieve up to 8x speed-up for MM. A commensurate improvement is observed for MM in a CGRA environment. We achieve better GF LOP S/mm2 compared to recent implementations.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129598869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Implementation of NOR Logic Based on Material Implication on CMOL FPGA Architecture 基于材料隐含的NOR逻辑在CMOL FPGA结构中的实现

2015 28th International Conference on VLSI Design Pub Date : 1900-01-01 DOI: 10.1109/VLSID.2015.94

P. Mane, Nishil Talati, Ameya Riswadkar, Bhavan Jasani, C. K. Ramesha

引用次数: 4

A High-Performance Energy-Efficient Hybrid Redundant MAC for Error-Resilient Applications 面向容错应用的高性能节能混合冗余MAC

2015 28th International Conference on VLSI Design Pub Date : 1900-01-01 DOI: 10.1109/VLSID.2015.65

Sunil Dutt, Anshu Chauhan, Rahul Bhadoriya, Sukumar Nandi, G. Trivedi

{"title":"A High-Performance Energy-Efficient Hybrid Redundant MAC for Error-Resilient Applications","authors":"Sunil Dutt, Anshu Chauhan, Rahul Bhadoriya, Sukumar Nandi, G. Trivedi","doi":"10.1109/VLSID.2015.65","DOIUrl":"https://doi.org/10.1109/VLSID.2015.65","url":null,"abstract":"In the majority of Digital Signal Processing (DSP) applications, such as image, audio and video processing, the final result is interpreted by human senses, and, the fact of confined perception of human senses declines the strict restriction on accuracy. Thus, by adopting the emerging concept of approximate computing, we propose an approximate radix-2 hybrid redundant Multiply-and-Accumulate (Approx MAC) unit which stems a novel Speed-Power-Accuracy-Area (SPAA) metrics. The Approx MAC unit attains tremendous improvements in computational performance, energy efficiency and silicon area with a trivial degradation in the output quality. To inspect the effectiveness of the proposed approach in real-time DSP applications, we demonstrate an Approx MAC unit embedded JPEG-E-X IP core architecture. The Approx MAC unit with 40 approximate LSBs ensures 7.177x and 1.526x speedup, 1.594x and 4.163x energy efficiency, and 1.131x and 1.277x silicon area improvements over binary and hybrid redundant MAC units, respectively. Moreover, the Approx MAC unit with 40 approximate LSBs decorates power precision and delay-precision metrics by 14.71% and 32.95%, respectively.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116167415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Way Halted Prediction Cache: An Energy Efficient Cache Architecture for Embedded Processors 方式停止预测缓存:嵌入式处理器的高能效缓存架构

2015 28th International Conference on VLSI Design Pub Date : 1900-01-01 DOI: 10.1109/VLSID.2015.16

Neethu Bal Mallya, Geeta Patil, B. Raveendran

引用次数: 6

Using Boolean Tests to Improve Detection of Transistor Stuck-Open Faults in CMOS Digital Logic Circuits 利用布尔测试改进CMOS数字逻辑电路中晶体管卡开故障的检测

2015 28th International Conference on VLSI Design Pub Date : 1900-01-01 DOI: 10.1109/VLSID.2015.73

X. Lin, S. Reddy, J. Rajski

{"title":"Using Boolean Tests to Improve Detection of Transistor Stuck-Open Faults in CMOS Digital Logic Circuits","authors":"X. Lin, S. Reddy, J. Rajski","doi":"10.1109/VLSID.2015.73","DOIUrl":"https://doi.org/10.1109/VLSID.2015.73","url":null,"abstract":"Currently transistor stuck-open (TSOP) faults in CMOS digital logic circuits are detected by two pattern tests consisting of an initialization pattern to set the output of a faulty gate followed by a pattern that detects a stuck-at fault. Some TSOP faults may not be detected by such two-pattern tests. One reason for this is that appropriate initialization patterns cannot be obtained using Boolean (steady state) analysis of the circuit. For some of these faults, required initialization may be possible using hazards (glitches) [10][13]. However, insuring that a test using hazard-based initialization actually detects the target fault requires accurate transient analysis of the circuit under test such as by SPICE. In this work we propose methods to augment test generation procedures to detect TSOP faults using traditional steady state Boolean analysis (called Boolean tests in this work). We also investigate the cause for the non-existence of test patterns for the faults not detected in benchmark circuits. In many such cases we found that the non-existence of test patterns is due to redundant gates that can be replaced by a constant 1 or 0. We present results on larger ISCAS-89 benchmark circuits to illustrate the effectiveness of the proposed methods to generate tests to detect TSOP faults and the results of analysis for the non-existence of tests for the remaining faults undetected by Boolean tests.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121265023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

FPGA Based Scalable Fixed Point QRD Core Using Dynamic Partial Reconfiguration 基于FPGA的动态局部重构可扩展定点QRD核心

2015 28th International Conference on VLSI Design Pub Date : 1900-01-01 DOI: 10.1109/VLSID.2015.64

G. Prabhu, Bibin Johnson, J. S. Rani

{"title":"FPGA Based Scalable Fixed Point QRD Core Using Dynamic Partial Reconfiguration","authors":"G. Prabhu, Bibin Johnson, J. S. Rani","doi":"10.1109/VLSID.2015.64","DOIUrl":"https://doi.org/10.1109/VLSID.2015.64","url":null,"abstract":"This work presents an FPGA based scalable fixed point QRD architecture based on Givens Rotation algorithm.The proposed QRD core utilizes an efficient pipelined and unfolded 2D MAC based systolic array architecture with dynamic partial reconfiguration(DPR) capability. An improved LUT based Newton-Raphson method is proposed for finding square root and inverse square root which helps in reducing the area by 71% and latency by 50%, while operating at a frequency 49% higher than the existing boundary cell architectures. The scalability of the QRD core is achieved using DPR which results in reduction in dynamic power and area utilization as compared to a static implementation. The proposed architecture is implemented on Xilinx Virtex-6 FPGA for any real matrices of size m × n where, 4 ≤ n ≤ 8 and m ≥ n by dynamically inserting or removing the partial modules. The evaluation results shows reduction in latency, area and power as compared to CORDIC based architectures. The proposed scalable QRD core is used for implementing a high performance adaptive equalizer(QRD-RLS Algorithm) used in mobile receiver's and the evaluation is done by transmitting BPSK symbols in the training mode.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121274200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A Novel Ternary Content-Addressable Memory (TCAM) Design Using Reversible Logic 一种基于可逆逻辑的三元内容寻址存储器(TCAM)设计

2015 28th International Conference on VLSI Design Pub Date : 1900-01-01 DOI: 10.1109/VLSID.2015.99

S. D. Kumar, S. Mahammad

{"title":"A Novel Ternary Content-Addressable Memory (TCAM) Design Using Reversible Logic","authors":"S. D. Kumar, S. Mahammad","doi":"10.1109/VLSID.2015.99","DOIUrl":"https://doi.org/10.1109/VLSID.2015.99","url":null,"abstract":"Content addressable memory is a special type of memory which can do search operation in a single clock cycle. CAM has disadvantages of high power dissipation during the matching operation. Ternary content addressable memory (TCAM) is a special type of memory which is used to search for logic 0, logic 1, logic 'x'. These types of memory are used in routers in order to perform the lookup table function in a single clock cycle. As the use of networks, typified by the Internet, has spread widely in recent years, attention has focused on TCAMs as a key device for increasing the speed of packet forwarding (packet data transfers) by networking equipment by enabling high-speed lookup of destinations, etc., for large volumes of information during packet data transfers. Reversible logic has gained its interest in recent years due to its ultra low power characteristics. Many works have been done to reduce the power consumption in TCAM. This paper deals with a novel design of TCAM cells using reversible logic. The proposed design is optimized in terms of number of garbage outputs and quantum cost. The proposed TCAM cell does the function of the conventional TCAM cell.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114508893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Framework for Selective Flip-Flop Replacement for Soft Error Mitigation 用于软错误缓解的选择性触发器替换框架

2015 28th International Conference on VLSI Design Pub Date : 1900-01-01 DOI: 10.1109/VLSID.2015.70

Pavan Vithal Torvi, V. Devanathan, V. Kamakoti

引用次数: 5