S. Ahish, D. Sharma, M. H. Vasantha, Kumar Y. B. Nithin
{"title":"Design and Analysis of Novel InSb/Si Heterojunction Double Gate Tunnel Field Effect Transistor","authors":"S. Ahish, D. Sharma, M. H. Vasantha, Kumar Y. B. Nithin","doi":"10.1109/ISVLSI.2016.52","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.52","url":null,"abstract":"In this work, an InSb/Si heterojunction hetero gate dielectric double gate TFET (HTFET) having a split pocket at Source-Channel junction has been designed and its analog/RF performance has been investigated. The analog/RF performance of the device is analysed in terms of I-V characteristics, transconsuctance (gm), parasitic capacitances, cut-off frequency (fT) and gain bandwidth product (GBW). Maximum fT of 777.8 GHz, maximum GBW of 393 GHz and a ION/IOFF ratio of 1010 were obtained from the simulations carried out. Further, circuit level performance analysis is performed by implementing a common source (CS) amplifier based on HTFET, using look-up table based Verilog-A model; a 3-dB roll-off frequency of 55.0981 GHz and unity gain cut-off frequency of 1.4652 THz were achieved.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129044652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SecCheck: A Trustworthy System with Untrusted Components","authors":"Rajshekar Kalayappan, S. Sarangi","doi":"10.1109/ISVLSI.2016.31","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.31","url":null,"abstract":"Mission critical applications face a security risk when they use third-party ICs for their speed and/or technology benefits. SecCheck is an architectural framework that securely incorporates fast, untrusted third-party cores (3PCs). It takes a comprehensive approach, providing for all of the different traditional fault tolerance techniques, to verify the 3PCs' functioning. The verification is done at run-time by slow, trusted, homegrown cores (HGCs). The overhead of providing security is reduced through intelligent scheduling exploiting task-level parallelism. The average performance penalty for achieving security under SecCheck is just 10-17% (optimal schedule), even when the HGCs are only half as fast as the 3PCs. We also devise a heuristic-based scheduler that is 500X faster than an ILP-based optimal one, with a relative penalty less than 1%.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128877991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low Cost VLSI Architecture for Sample Adaptive Offset Encoder in HEVC","authors":"Sayed El Gendy, A. Shalaby, M. Sayed","doi":"10.1109/ISVLSI.2016.78","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.78","url":null,"abstract":"Sample Adaptive Offset (SAO) has been adopted as a new in-loop filtering block in High Efficiency Video Coding (HEVC). It can significantly increase compression efficiency especially for sequences that contain computer graphics content up to 23%. To get the optimum SAO parameters, exhaustive operations are required because of the huge amount of samples which the encoder has to study. In this work, a low cost high throughput VLSI implementation for the parameter estimation (encoding) phase is proposed. The proposed novel architecture reduces the cost in terms of gates count by 47% in comparison with prior work. The proposed design is prototyped using 65 nm CMOS technology. It has 89.3 Kgates, 8832 bits SRAM, and a maximum clock frequency of 426 MHz. It can support real time 8K×4K@120fps videos at 378 MHz.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"971 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123079693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Malik, Farnoud Farahmand, P. Otto, N. Akhlaghi, T. Mohsenin, S. Sikdar, H. Homayoun
{"title":"Architecture Exploration for Energy-Efficient Embedded Vision Applications: From General Purpose Processor to Domain Specific Accelerator","authors":"Maria Malik, Farnoud Farahmand, P. Otto, N. Akhlaghi, T. Mohsenin, S. Sikdar, H. Homayoun","doi":"10.1109/ISVLSI.2016.112","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.112","url":null,"abstract":"OpenCV applications are computationally intensive tasks among computer vision algorithms. The demand for low power yet high performance real-time processing of OpenCV embedded vision applications have led to developing their customized implementations on state-of-the-art embedded processing platforms. Given the industry move to heterogeneous platforms which integrates single core or multicore CPU with on-chip FPGA accelerators and GPU accelerators, the question of what platform and what implementation, whether hardware or software, is best suited for energy-efficient processing of this class of applications is becoming important. In this paper, we seek to answer this question through a detailed hardware and software implementation of OpenCV applications and methodically measurement and comprehensive analysis of their power and performance on state-of-the-art heterogeneous embedded processing platforms. The results show that in addition to application behavior, the size of image is an important factor in deciding the efficient platform in terms of highest energy-efficiency (EDP) among hardware accelerators on FPGA and software accelerators on GPU and multicore CPUs. While hardware implementation on ZYNQ shown to be the most performance and energy-efficient for image size of 500x500 or less, software GPU implementation found to be the most efficient and achieves highest speedup for larger image sizes. In addition, while for compute intensive vision applications the gap between FPGA, CPU and GPU reduces as the size of image increases, for non-intensive applications, a large performance and EDP gap is observed between the studied platforms, as the size of the image increases.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123090613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Soft-Error-Rate (SER) Estimation for Combinational Logic and Sequential Elements","authors":"Ji Li, J. Draper","doi":"10.1109/ISVLSI.2016.28","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.28","url":null,"abstract":"With drastic device shrinking, low operating voltages, increasing complexities, and high speed operations, radiation-induced soft errors have posed an ever increasing reliability challenge to both combinational and sequential circuits in advanced CMOS technologies. Therefore, it is imperative to devise efficient soft error rate (SER) estimation methods, in order to evaluate the soft error vulnerabilities for cost-effective robust circuit design. Previous works either analyze only SER in combinational circuits or evaluate soft error vulnerabilities in sequential elements. In this paper, a joint SER estimation framework is proposed, which considers single-event transients (SETs) in combinational logic and multiple cell upsets (MCUs) in sequential components. Various masking effects are considered in the combinational SER estimation process, and several typical radiation-hardened and non-hardened flip-flop structures are analyzed and compared as the sequential elements. A schematic and layout co-simulation approach is proposed to model the MCUs for redundant sequential storage structures. Experimental results of a variety of ISCAS benchmark circuits using the Nangate 45nm CMOS standard cell library demonstrate the difference in soft error resilience among designs using different sequential elements and the importance of modeling MCUs in redundant structures.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133942602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gate Overdrive with Split-Circuit Biasing to Substitute for Body Biasing in FinFET and UTB FDSOI Circuits","authors":"Andrew Whetzel, M. Stan","doi":"10.1109/ISVLSI.2016.136","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.136","url":null,"abstract":"Body biasing (BB) in bulk CMOS is an important tool for circuit designers that enables dynamic modulation of device thresholds post-fabrication, thus potentially improving yields, or allowing the circuit to adapt to different power modes, such as fully active or sleep. Fully-depleted silicon-on-insulator (FDSOI) FETs, such as ultrathin body (UTB) devices, may benefit from the same effect when the buried oxide (BOX) is thin enough to allow back plane biasing (BPB) to affect the accumulation or inversion in the channel. However, when the BOX is thick the back plane potential has very little effect on the channel, eliminating the ability to modulate threshold voltage via BPB. Similarly, FinFETs benefit very little from controlled body effect because the gate has nearly full control over the channel. In this paper a new circuit topology is presented which can act as a substitute for body biasing without relying on the body effect. The inputs, outputs, and supply rails are split in such a way that the gates of some devices are overdriven without increasing voltage swing, resulting in a higher Ion and reduced latency under forward bias, or reducing leakage current under reverse bias. For a 28nm FDSOI process a speedup of up to 15% can be realized under forward bias with an increase in power of 19%, while static power can be reduced by up to 35% with a 19% decrease in performance.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134006563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"STA: A Highly Scalable Low Latency Butterfly Fat Tree Based 3D NoC Design","authors":"Avik Bose, P. Ghosal, S. Mohanty","doi":"10.1109/ISVLSI.2016.127","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.127","url":null,"abstract":"Since the past decade Network-on-Chip has evolved as the most dominant and efficient solution in on-chip communication paradigm for multi-core systems. With the growing number of on-chip processing cores modern three dimensional NoC design is facing several challenges originating from various network performance parameters like latency, hop count etc. Scalability and network efficiency have generated an important trade off in 3D NoC design, which needs to be balanced, especially for application specific NoC design. Tree based topologies outperform mesh based topologies in terms of network latency and throughput with increasing injection rate of packets/flits. But on the other hand, floor planing becomes much more complex for tree based designs with increasing number of IP blocks compared to mesh due to the hierarchical structure. This paper introduces a novel 3D NoC architecture named Split Tree Architecture (STA), based on butterfly fat tree, which is highly scalable while maintaining low network latency and hop count significantly. There are latency improvements of 51-91%, 84-96%, 55-96%, and 48-96% over mesh, torus, butterfly, and flattened butterfly topologies respectively. Average hop count is improved by 44% and 30% over mesh and torus. Average and minimum acceptance rates are improved by 3-8% and 3-12% over torus and, 4-7% and 4-12% over flattened butterfly. In comparison to the previously reported state of the art 3D BFT based designs, STA achieves performance improvements of 19-78%, 2-42%, 0.2-0.6%, and around 20%, for average latency, average acceptance rate, minimum acceptance rate, and average hop count respectively.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130927001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LLPA: Logic State Based Leakage Power Analysis","authors":"S. Dhanuskodi, S. Keshavarz, Daniel E. Holcomb","doi":"10.1109/ISVLSI.2016.121","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.121","url":null,"abstract":"Numerous side-channel attacks on integrated circuit implementations of cryptographic systems have been demonstrated in literature. Insecure implementations can reveal secret information through data dependencies in dynamic and leakage power profiles. Side-channel resistant logic styles are effective against dynamic power analysis attacks, but are suggested to exhibit weaknesses against the less common Leakage Power Analysis (LPA) attacks. We present a novel LPA attack that uses knowledge of a circuit's internal structure to mount a stronger attack via the leakage power side-channel, and show that even dual-rail side-channel resistant logic styles are susceptible to these LPA attacks. Our proposed LPA attack can successfullyextract secret key information from S-boxes even in the presenceof large amounts of random on-chip noise, and in scenarioswhere Hamming-weight based techniques are unsuitable. We alsoevaluate the impact of process variations on our scheme, andpropose strategies for mitigating this impact.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131016727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Taylor Series Based Architecture for Quadruple Precision Floating Point Division","authors":"M. Jaiswal, Hayden Kwok-Hay So","doi":"10.1109/ISVLSI.2016.10","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.10","url":null,"abstract":"This paper presents an area efficient architecture for quadruple precision division arithmetic on the FPGA platform. Many application demands for the higher precision computation (like quadruple precision) than the single and double precision. Division is an important arithmetic, but requires a huge amount of hardware resources with increasing precision, for a complete hardware implementation. So, this paper presents an iterative architecture for quadruple precision division arithmetic with small area requirement and promising speed. The implementation follows the standard processing steps for the floating point division arithmetic, including processing of sub-normal operands and exceptional case handling. The most dominating part of the architecture, the mantissa division, is based on the series expansion methodology of division, and designed in an iterative fashion to minimize the hardware requirement. This unit requires a 114×114 bit integer multiplier, and thus, a FPGA based area-efficient integer multiplier is also proposed with better design metrics than prior art on it. These proposed architectures are implemented on the Xilinx FPGA platform. The proposed quadruple precision division architecture shows a small hardware usage with promising speed.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127208903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of Division Circuits for Stochastic Computing","authors":"Te-Hsuan Chen, J. Hayes","doi":"10.1109/ISVLSI.2016.48","DOIUrl":"https://doi.org/10.1109/ISVLSI.2016.48","url":null,"abstract":"Stochastic computing (SC) encodes data in the signal probabilities associated with pseudo-random bit-streams. It enables very low-area and low-power arithmetic operations using standard VLSI circuits, it is also highly error-tolerant. While addition, subtraction and multiplication have extremely simple SC implementations, this is not true for division. Known stochastic dividers employ sequential logic circuits whose accuracy, convergence properties, etc., are unsatisfactory or not well under-stood. As a result, division is usually avoided or approximated in SC design. We first review and analyze in depth the existing design approaches to stochastic division. We then propose a novel division technique called CORDIV that exploits correlation between the input parameters. CORDIV not only has lower cost than previous stochastic dividers, but is also significantly more accurate. Area is reduced mainly because CORDIV requires less overhead for stochastic number conversion. We provide experimental data showing a typical 3x reduction in area and about a 10x improvement in accuracy.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121315394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}