ACM Great Lakes Symposium on VLSI最新文献

High level energy modeling of controller logic in data caches 数据缓存中控制器逻辑的高级能量建模

ACM Great Lakes Symposium on VLSI Pub Date : 2014-05-20 DOI: 10.1145/2591513.2591590

P. Panda, Sourav Roy, Srikanth Chandrasekaran, Namita Sharma, Jasleen Kaur, Sarath Kumar Kandalam, N. Nagaraj

引用次数: 3

Horizontal benchmark extension for improved assessment of physical CAD research 提高物理CAD研究评价水平基准扩展

ACM Great Lakes Symposium on VLSI Pub Date : 2014-05-20 DOI: 10.1145/2591513.2591540

A. Kahng, Hyein Lee, Jiajia Li

{"title":"Horizontal benchmark extension for improved assessment of physical CAD research","authors":"A. Kahng, Hyein Lee, Jiajia Li","doi":"10.1145/2591513.2591540","DOIUrl":"https://doi.org/10.1145/2591513.2591540","url":null,"abstract":"The rapid growth in complexity and diversity of IC designs, design flows and methodologies has resulted in a benchmark-centric culture for evaluation of performance and scalability in physicaldesign algorithm research. Landmark papers in the literature present vertical benchmarks that can be used across multiple design flow stages; artificial benchmarks with characteristics that mimic those of real designs; artificial benchmarks with known optimal solutions; as well as benchmark suites created by major companies from internal designs and/or open-source RTL. However, to our knowledge, there has been no work on horizontal benchmark creation, i.e., the creation of benchmarks that enable maximal, comprehensive assessments across commercial and academic tools at one or more specific design stages. Typically, the creation of horizontal benchmarks is limited by mismatches in data models, netlist formats, technology files, library granularity, etc. across different tools, technologies, and benchmark suites. In this paper, we describe methodology and robust infrastructure for horizontal benchmark extension\" that permits maximal leverage of benchmark suites and technologies in \"apples-to-apples\" assessment of both industry and academic optimizers. We demonstrate horizontal benchmark extensions, and the assessments that are thus enabled, in two well-studied domains: place-and-route (four combinations of academic placers/routers, and two commercial P&R tools) and gate sizing (two academic sizers, and three commercial tools). We also point out several issues and precepts for horizontal benchmark enablement.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128230054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

A performance enhancing hybrid locally mesh globally star NoC topology 一种增强性能的混合局部网格全局星型NoC拓扑

ACM Great Lakes Symposium on VLSI Pub Date : 2014-05-20 DOI: 10.1145/2591513.2591544

T. S. Das, P. Ghosal, S. Mohanty, E. Kougianos

引用次数: 3

A parallel and reconfigurable architecture for efficient OMP compressive sensing reconstruction 一种用于高效OMP压缩感知重构的并行可重构结构

ACM Great Lakes Symposium on VLSI Pub Date : 2014-05-20 DOI: 10.1145/2591513.2591598

A. Kulkarni, H. Homayoun, T. Mohsenin

{"title":"A parallel and reconfigurable architecture for efficient OMP compressive sensing reconstruction","authors":"A. Kulkarni, H. Homayoun, T. Mohsenin","doi":"10.1145/2591513.2591598","DOIUrl":"https://doi.org/10.1145/2591513.2591598","url":null,"abstract":"Compressive Sensing (CS) is a novel scheme, in which a signal that is sparse in a known transform domain can be reconstructed using fewer samples. However, the signal reconstruction techniques are computationally intensive and power consuming, which make them impractical for embedded applications. This work presents a parallel and reconfigurable architecture for Orthogonal Matching Pursuit (OMP) algorithm, one of the most popular CS reconstruction algorithms. In this paper, we are proposing the first reconfigurable OMP CS reconstruction architecture which can take different image sizes with sparsity up to 32. The aim is to minimize the hardware complexity, area and power consumption, and improve the reconstruction latency while meeting the reconstruction accuracy. First, the accuracy of reconstructed images is analyzed for different sparsity values and fixed point word length reduction. Next, efficient parallelization techniques are applied to reconstruct signals with variant signal lengths of N. The OMP algorithm is mainly divided into three kernels, where each kernel is parallelized to reduce execution time, and efficient reuse of the matrix operators allows us to reduce area. The proposed architecture can reconstruct images of different sizes and measurements and is implemented on a Xilinx Virtex 7 FPGA. The results indicate that, for a 128x128 image reconstruction, the proposed reconfigurable architecture is 2.67x to 1.8x faster than the previous non-reconfigurable work which is less complex and uses much smaller sparsity.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121427332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

A comparison of FinFET based FPGA LUT designs 基于FinFET的FPGA LUT设计比较

ACM Great Lakes Symposium on VLSI Pub Date : 2014-05-20 DOI: 10.1145/2591513.2591596

M. Abusultan, S. Khatri

{"title":"A comparison of FinFET based FPGA LUT designs","authors":"M. Abusultan, S. Khatri","doi":"10.1145/2591513.2591596","DOIUrl":"https://doi.org/10.1145/2591513.2591596","url":null,"abstract":"The FinFET device has gained much traction in recent VLSI designs. In the FinFET device, the conduction channel is vertical, unlike a traditional bulk MOSFET, in which the conduction channel is planar. This yields several benefits, and as a consequence, it is expected that most VLSI designs will utilize FinFETs from the 20nm node and beyond. Despite the fact that several research papers have reported FinFET based circuit and layout realizations for popular circuit blocks, there has been no reported work on the use of FinFETs for Field Programmable Gate Array (FPGA) designs. The key circuit in the FPGA that enables programmability is the n-input Look-up Table (LUT). An n-input LUT can implement any logic function of up to n inputs. In this paper, we present an evaluation of several FPGA LUT designs. We compare these designs from a performance (delay, power, energy) as well as an area perspective. Comparisons are conducted with respect to a bulk based LUT as well. Our results demonstrate that all the FinFET based LUTs exhibit better delays and energy than the bulk based LUT. Based on our comparisons, we have two winning candidate LUTs, one for high performance designs (3X faster than a bulk based LUT) and another for low energy, area constrained designs (83% energy and 58% area compared to a bulk based LUT).","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115937627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Neural network-based accelerators for transcendental function approximation 基于神经网络的超越函数逼近加速器

ACM Great Lakes Symposium on VLSI Pub Date : 2014-05-20 DOI: 10.1145/2591513.2591534

Schuyler Eldridge, F. Raudies, D. Zou, A. Joshi

引用次数: 25

Reconfigurable STT-NV LUT-based functional units to improve performance in general-purpose processors 可重构的基于STT-NV lut的功能单元，以提高通用处理器的性能

ACM Great Lakes Symposium on VLSI Pub Date : 2014-05-20 DOI: 10.1145/2591513.2591535

Adarsh Reddy Ashammagari, H. Mahmoodi, T. Mohsenin, H. Homayoun

{"title":"Reconfigurable STT-NV LUT-based functional units to improve performance in general-purpose processors","authors":"Adarsh Reddy Ashammagari, H. Mahmoodi, T. Mohsenin, H. Homayoun","doi":"10.1145/2591513.2591535","DOIUrl":"https://doi.org/10.1145/2591513.2591535","url":null,"abstract":"Unavailability of functional units is a major performance bottleneck in general-purpose processors (GPP). In a GPP with limited number of functional units while a functional unit may be heavily utilized at times, creating a performance bottleneck, the other functional units might be under-utilized. We propose a novel idea for adapting functional units in GPP architecture in order to overcome this challenge. For this purpose, a selected set of complex functional units that might be under-utilized such as multiplier and divider, are realized using a programmable look up table-based fabric. This allows for run-time adaptation of functional units to improving performance. The programmable look up tables are realized using magnetic tunnel junction (MTJ) based memories that dissipate near zero leakage and are CMOS compatible. We have applied this idea to a dual issue architecture. The results show that compared to a design with all CMOS functional units a performance improvement of 18%, on average is achieved for standard benchmarks. This comes with 4.1% power increase in integer benchmarks and 2.3% power decrease in floating point benchmarks, compared to a CMOS design.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115837651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Trade-off between energy and quality of service through dynamic operand truncation and fusion 通过动态操作数截断和融合实现能量和服务质量的权衡

ACM Great Lakes Symposium on VLSI Pub Date : 2014-05-20 DOI: 10.1145/2591513.2591561

Wenchao Qian, Robert Karam, S. Bhunia

{"title":"Trade-off between energy and quality of service through dynamic operand truncation and fusion","authors":"Wenchao Qian, Robert Karam, S. Bhunia","doi":"10.1145/2591513.2591561","DOIUrl":"https://doi.org/10.1145/2591513.2591561","url":null,"abstract":"Energy efficiency has emerged as a major design concern for embedded and portable electronics. Conventional approaches typically impact performance and often require significant design-time modifications. In this paper, we propose a novel approach for improving energy efficiency through judicious fusion of operations. The proposed approach has two major distinctions: (1) the fusion is enabled by operand truncation, which allows representing multiple operations into a reasonably sized lookup table (LUT); and (2) it works for large varieties of functions. Most applications in the domain of digital signal processing (DSP) and graphics can tolerate some computation error without large degradation in output quality. Our approach improves energy efficiency with graceful degradation in quality. The proposed fusion approach can be applied to trade-off energy efficiency with quality at run time and requires virtually no circuit or architecture level modifications in a processor. Using our software tool for automatic fusion and truncation, the effectiveness of the approach is studied for four common applications. Simulation results show promising improvements (19-90%) in energy delay product with minimal impact on quality.","PeriodicalId":272619,"journal":{"name":"ACM Great Lakes Symposium on VLSI","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114866826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Forward-scaling, serially equivalent parallelism for FPGA placement FPGA放置的前向缩放、串行等效并行性

ACM Great Lakes Symposium on VLSI Pub Date : 2014-05-20 DOI: 10.1145/2591513.2591543

C. Fobel, G. Grewal, D. Stacey

引用次数: 3

A current-mode CMOS/memristor hybrid implementation of an extreme learning machine 一种电流模式CMOS/忆阻器混合实现的极限学习机

ACM Great Lakes Symposium on VLSI Pub Date : 2014-05-20 DOI: 10.1145/2591513.2591572

Cory E. Merkel, D. Kudithipudi

引用次数: 19