2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)最新文献

筛选
英文 中文
MINLP Based Power Optimization for Pipelined ADC 基于MINLP的流水线ADC功率优化
2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI: 10.1109/ISVLSI.2016.64
A. Purushothaman
{"title":"MINLP Based Power Optimization for Pipelined ADC","authors":"A. Purushothaman","doi":"10.1109/ISVLSI.2016.64","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.64","url":null,"abstract":"This paper proposes a Mixed Integer Non-linear Programming (MINLP) based optimization algorithm to design power optimized pipelined ADC. For a given specification the proposed algorithm gives stage resolution and sampling capacitor per stage that minimizes the total power consumption. Closed form expressions of the power consumption of each stage were derived and used as objective function. Pipelined ADCs of various specifications, viz., 10-bit, 12-bit, and 16-bit, were designed and validated using this algorithm.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116536030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
The Impact of Heterogeneity on a Reconfigurable Multicore System 异构对可重构多核系统的影响
2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI: 10.1109/ISVLSI.2016.67
Rafael Fão de Moura, J. D. Souza, L. Carro, A. C. S. Beck, M. B. Rutzig
{"title":"The Impact of Heterogeneity on a Reconfigurable Multicore System","authors":"Rafael Fão de Moura, J. D. Souza, L. Carro, A. C. S. Beck, M. B. Rutzig","doi":"10.1109/ISVLSI.2016.67","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.67","url":null,"abstract":"Modern embedded system must efficiently exploit parallelism at thread-and instruction-level to achieve the best performance with the lowest energy consumption possible. While Multiprocessor System-on-Chip (MPSoCs) are a commonly used solution, they do not provide an effective environment for software production, as each processing element implements a different Instruction Set Architecture (ISA). On the other hand, processors such as the ARM big.LITTLE comprise multicores with different organizations and the same ISA. However, such cores are power consuming superscalar microarchitectures. Dynamic Reconfigurable Architectures (DRA) emerge as a solution to fill this gap. By taking advantage of its regular fabric, it is possible to develop a low-energy heterogeneous system by coupling to the cores DRAs with different processing capabilities and that implements the same ISA. In this work, we evaluate such system, varying both the size of the DRAs and the memory system involved. We show that, by tuning the latter, one can reach energy savings of up to 36%, while by using a fully heterogeneous system, saves of 28% in energy and losses of 7% in performance are observed when compared to its counterpart homogeneous version.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122884070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Approximate Adder with Hybrid Prediction and Error Compensation Technique 基于混合预测和误差补偿技术的近似加法器
2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI: 10.1109/ISVLSI.2016.16
Xinghua Yang, Yue Xing, F. Qiao, Qi Wei, Huazhong Yang
{"title":"Approximate Adder with Hybrid Prediction and Error Compensation Technique","authors":"Xinghua Yang, Yue Xing, F. Qiao, Qi Wei, Huazhong Yang","doi":"10.1109/ISVLSI.2016.16","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.16","url":null,"abstract":"This paper proposed an approximate adder to accelerate computation and reduce energy consumption for error-resilient applications with a moderate output quality losses. The computation acceleration comes from the predictionscheme for the adder circuit, where the critical path is divided into multiple short fragments and a paralleling addition progress is enabled. The energy consumption is reduced as the result of trimming the registers from the lower predictors of the design. Furthermore, a simple module for error compensation is inserted into the approximate part of the circuit to decrease the relative error with very little hardware cost. Being simulated with 65nm CMOS process, 2.82X speedups and 57.8% energy-efficiency improvements have been achieved compared with traditional adders. Compared with the currenthigh performance approximate adders, the proposed addershows 6.9% energy-savings with 2 orders of reduction inrelative error using random test data. At last, the proposedapproximate adder is adopted in DCT processing, where more than 10dB PSNR increase can be achieved, compared with the current counterpart designs.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115804958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Gracefully Degrading and Energy-Efficient Fault Tolerant NoC Using Spare Core 一种使用备用核心的优雅降级和节能容错NoC
2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI: 10.1109/ISVLSI.2016.80
B. N. K. Reddy, M. H. Vasantha, Kumar Y. B. Nithin
{"title":"A Gracefully Degrading and Energy-Efficient Fault Tolerant NoC Using Spare Core","authors":"B. N. K. Reddy, M. H. Vasantha, Kumar Y. B. Nithin","doi":"10.1109/ISVLSI.2016.80","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.80","url":null,"abstract":"Reliability is a significant strategy concern for modern day multi core embedded systems. On chip communicating systems are vulnerable to permanent network faults and transient faults which might essentially affect the performance of the system. Targeting at fault tolerance solution for cores with faults in Network on Chip (NoC), this paper proposes an energy efficient fault tolerant NoC architecture using spare core. The proposed strategy comprises of finding smallest rectangular region to place the given application using a heuristic technique, and mapping vertices within the selected region, and selecting a region which results maximum overall performance and minimum communication energy. Spare core is placed within a region and connected to the vertices. Many application core graphs are used to evaluate the proposed technique. The simulation outcomes of many fault injection tests indicate that the proposed technique results in performance enhancement while also saving communication energy.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121882208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Mod (2P-1) Shuffle Memory-Access Instructions for FFTs on Vector SIMD DSPs 矢量SIMD dsp上fft的Mod (2P-1) Shuffle内存访问指令
2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI: 10.1109/ISVLSI.2016.71
Sheng Liu, Haiyan Chen, Jianghua Wan, Yaohua Wang
{"title":"Mod (2P-1) Shuffle Memory-Access Instructions for FFTs on Vector SIMD DSPs","authors":"Sheng Liu, Haiyan Chen, Jianghua Wan, Yaohua Wang","doi":"10.1109/ISVLSI.2016.71","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.71","url":null,"abstract":"Binary Exchange Algorithm (BEA) always introduces excessive shuffle operations when mapping FFTs on vector SIMD DSPs. This can greatly restrict the overall performance. We propose a novel mod (2P-1) shuffle function and Mod-BEA algorithm (MBEA), which can halve the shuffle operation count and unify the shuffle mode. Such unified shuffle mode inspires us to propose a set of novel mod (2P-1) shuffle memory-access instructions, which can totally eliminate the shuffle operations. Experimental results show that the combination of MBEA and the proposed instructions can bring 17.2%-31.4% performance improvements at reasonable hardware cost, and compress the code size by about 30%.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129959085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dynamic Per-Warp Reconvergence Stack for Efficient Control Flow Handling in GPUs gpu中高效控制流处理的动态逐曲再收敛堆栈
2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI: 10.1109/ISVLSI.2016.35
Yaohua Wang, Xiaowen Chen, Dong Wang, Sheng Liu
{"title":"Dynamic Per-Warp Reconvergence Stack for Efficient Control Flow Handling in GPUs","authors":"Yaohua Wang, Xiaowen Chen, Dong Wang, Sheng Liu","doi":"10.1109/ISVLSI.2016.35","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.35","url":null,"abstract":"GPGPUs usually experience performance degradation when the control flow of threads diverges in a warp. Reconvergence stack based control flow handling scheme is widely adopted in GPU architectures. The depth of such stack is always set to a large number, so that there can be enough entries for warps experiencing nested branches. However, for warps experiencing simple branches or even no branches, those deep reconvergence stacks would stay idle, causing a serious waste of hardware resource. Moreover, with the development of GPU architectures, more and more warps will be deployed on a GPU stream processor core, such problem could be even more serious. To solve this problem, this paper propose a dynamic reconvergence stack structure, in which a stack pool is shared by all the warps, and dynamic stacks of different warps can be constructed according to the run-time requirement. This can satisfy the stack requirement while eliminating unnecessary waste of hardware resource. Our experiments show that the dynamic reconvergence stack can reduce the cost of stack by 50% with the conventional performance well maintained.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131493188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Design Optimization of Register File Throughput and Energy Using a Virtual Prototyping (ViPro) Tool 使用虚拟样机(ViPro)工具设计寄存器文件吞吐量和能量的优化
2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI: 10.1109/ISVLSI.2016.50
Ningxi Liu, B. Calhoun
{"title":"Design Optimization of Register File Throughput and Energy Using a Virtual Prototyping (ViPro) Tool","authors":"Ningxi Liu, B. Calhoun","doi":"10.1109/ISVLSI.2016.50","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.50","url":null,"abstract":"Register files (RFs) consume significant power in low-power processors, and their specifications vary substantially for different applications. Challenges exist in identifying the appropriate RF design and optimizing RFs for different specifications. This paper not only explores methodologies of designing low power and high performance RFs and it also extends a virtual prototyping (ViPro) tool to support fast and efficient estimation of different design knobs on the overall multi-port RF macros. To enable aggressive exploration for RFs design, three bitline (BL) sensing schemes are included into ViPro along with parasitic parameters extracted from layout. Accuracy of ViPro results are within 15 % compared to full RF schematic SPICE simulation, while the simulation speed of ViPro is 5-10 times faster. An example reveals how ViPro can optimize RF design based on various specifications in a 45nm CMOS technology. Improvements of data throughput for 1R/1W port RFs are 31% and 72% at 0.5KB and 512KB, respectively, with proper BL sensing techniques. Results also show that the optimal BL sensing scheme changes with memory capacity. At 0.5KB, the lowest energy per operation decreases by 7.5% with a single-ended BL, while energy reduction is 45% with a hierarchical BL for 512KB.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122371977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Accurate All CMOS Temperature Sensor for IoT Applications 用于物联网应用的精确全CMOS温度传感器
2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI: 10.1109/ISVLSI.2016.113
Sunil Kumar Maddikatla, S. Jandhyala
{"title":"An Accurate All CMOS Temperature Sensor for IoT Applications","authors":"Sunil Kumar Maddikatla, S. Jandhyala","doi":"10.1109/ISVLSI.2016.113","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.113","url":null,"abstract":"In this manuscript an area efficient, linear, robust CMOS integrated temperature sensor circuit has been proposed in multiple technology nodes using UMC RF process for IoT and low cost SoC applications. In UMC 180nm node the proposed temperature sensor has an accuracy of ±0.4°C over 3σ variation in process and ±10% variation in supply, in the temperature range -55°C to 125°C. In 65nm node it has an accuracy of ±0.6°C over 3σ variation in process and ±10% variation in supply, in the temperature range -55°C to 125°C. The proposed design achieves a highly linear, proportional to absolute temperature (PTAT) voltage at reduced process corner dependence, using a process invariant circuit in conjunction with a supply independent biasing circuit. The supply sensitivity of the output voltage is 1100 ppm/V and spread with process is limited to ±0.6°C at UMC 180nm and ±1.5°C at 65nm technology. The proposed sensor in UMC 180nm technology occupies an area of 0.002 mm<sup>2</sup> and consumes 108μW of power. The output voltage is 136mV at room temperature (27°C) in typical corner, with a slope of 0.650mV/°C. The temperature sensor is included in a micro gyroscope application and the effect of temperature on the angular frequency at zero bias is presented.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121528339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Workload-Aware Power Gating Design and Run-Time Management for Massively Parallel GPGPUs 大规模并行gpgpu的工作负载感知功率门控设计和运行时管理
2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI: 10.1109/ISVLSI.2016.60
K. Dev, S. Reda, Indrani Paul, Wei Huang, W. Burleson
{"title":"Workload-Aware Power Gating Design and Run-Time Management for Massively Parallel GPGPUs","authors":"K. Dev, S. Reda, Indrani Paul, Wei Huang, W. Burleson","doi":"10.1109/ISVLSI.2016.60","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.60","url":null,"abstract":"Power gating (PG) is an effective power efficiency improvement technique. Future general-purpose graphics processing units (GPGPUs) will likely feature hundreds of compute units (CUs) and be power constrained, which leads to serious challenges to existing PG methodologies. In this paper, we propose novel design-time and run-time techniques to effectively implement power gating in future GPGPUs. Based on industrial models/measurement facilities, we show that designers must consider run-time parallelism within potential applications while implementing power gating designs to avoid incurring unnecessary design overheads. By scaling measurements from a real 28nm GPGPU to a hypothetical future 10nm node, we show that a PG granularity of 16 CU/cluster achieves 99% peak run-time performance without the excessive 53% design-time area overhead of per-CU PG. We also demonstrate that a run-time power management algorithm that is aware of the PG granularity leads to up to 18% additional performance through frequency-boosting under thermal-design power (TDP) constraints.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121398504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
SoC, NoC and Hierarchical Bus Implementations of Applications on FPGAs Using the FCUDA Flow 基于FCUDA流程的fpga应用的SoC, NoC和分层总线实现
2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2016-07-01 DOI: 10.1109/ISVLSI.2016.131
T. Nguyen, Yao Chen, K. Rupnow, S. Gurumani, Deming Chen
{"title":"SoC, NoC and Hierarchical Bus Implementations of Applications on FPGAs Using the FCUDA Flow","authors":"T. Nguyen, Yao Chen, K. Rupnow, S. Gurumani, Deming Chen","doi":"10.1109/ISVLSI.2016.131","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.131","url":null,"abstract":"The FCUDA project aims to improve programmability of FPGAs and expression of application parallelism in High Level Synthesis (HLS) through the use of the CUDA language. The CUDA language is a popular single-instruction multiple data (SIMD) style programming language with wide adoption, thus offering significant opportunity to bring experienced programmers to FPGA computing. The FCUDA project now has open-sourced the core CUDA to RTL transformation as well as the infrastructure for design space exploration, bus-based andNoC-based on-chip communications, and platform integration with Xilinx's SoC systems. In this paper, we present FCUDA's design space exploration, interconnect and platform integration to present guidelines for selecting system-level infrastructure for an application for the best implementation.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129127215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信