{"title":"SRAM stability analysis for different cache configurations due to Bias Temperature Instability and Hot Carrier Injection","authors":"Taizhi Liu, Chang-Chih Chen, Jiadong Wu, L. Milor","doi":"10.1109/ICCD.2016.7753284","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753284","url":null,"abstract":"Bias Temperature Instability (BTI) and Hot Carrier Injections (HCI) are two of the main effects that increase a transistor's threshold voltage and further cause performance degradations. These two wearout mechanisms affect all transistors, but are especially acute in the SRAM cells of first-level (L1) caches, which are frequently accessed and are critical for microprocessor performance. This work studies the cache lifetimes due to the combined effect of BTI and HCI for different cache configurations, including variation in cache size, associativity, cache line size, and the replacement algorithm. The effect of process variations is also considered. We analyze the reliability (failure probability) and performance (hit rate) of the L1 cache within a LEON3 microprocessor, while the LEON3 is running a set of benchmarks, and we provide essential insights on performance-reliability tradeoffs for cache designers.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132534259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chang Song, Beiye Liu, Chenchen Liu, Hai Helen Li, Yiran Chen
{"title":"Design techniques of eNVM-enabled neuromorphic computing systems","authors":"Chang Song, Beiye Liu, Chenchen Liu, Hai Helen Li, Yiran Chen","doi":"10.1109/ICCD.2016.7753356","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753356","url":null,"abstract":"The recently emerged research on “neuromorphic computing”, which stands for hardware acceleration of brain-inspired computing, has become one of the most active research areas in computer engineering. In this invited paper, we start with a background introduction of neuromorphic computing, followed by some examples of hardware acceleration schemes of learning and neural network algorithms on emerging nonvolatile memory (eNVM)-based neuromorphic computing engine. At the end, we share our prospects on the future technology challenges and advances of neuromorphic computing.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"4 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130045400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sudhir K. Satpathy, S. Mathew, Vikram B. Suresh, R. Krishnamurthy
{"title":"Ultra-low energy security circuits for IoT applications","authors":"Sudhir K. Satpathy, S. Mathew, Vikram B. Suresh, R. Krishnamurthy","doi":"10.1109/ICCD.2016.7753358","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753358","url":null,"abstract":"Low-area energy-efficient security primitives are key building blocks for enabling end-to-end content protection, user authentication, and consumer confidentiality in the IoT world that is estimated to surpass 50billion smart and connected devices by 2020. This paper describes design approaches that blend energy-efficient circuit techniques with optimal accelerator microarchitecture datapath, and hardware friendly arithmetic to achieve ultra-low energy consumption in security platforms for seamless adoption in area/battery constrained and self-powered systems. Industry leading energy-efficiency is demonstrated with three designs, fabricated and measured in advanced process technologies: 1) A 2040-gate arithmetically optimized composite-field Sbox based AES accelerator achieves 289Gbps/W peak energy-efficiency while offering 432Mbps throughput in 22nm tri-gate CMOS, 2) Hybrid Physically Unclonable Function (PUF) circuit leverages burn-in induced aging to reduce bit-error, followed by temporal-majority-voting, dark-bit masking, and error-correction conditioning techniques to generate a 100% stable full-entropy key with 190fJ/bit energy consumption in 22nm tri-gate CMOS. 3) A light-weight all digital TRNG uses in-line correlation suppressor and entropy-extractor circuits to achieve >0.99 min-entropy with 3pJ/bit measured energy-efficiency while operating down to 300mV in 14nm tri-gate CMOS.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133302646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Li, Sheng Ma, Lu Wang, Zicong Wang, Xia Zhao, Yang Guo
{"title":"DLL: A dynamic latency-aware load-balancing strategy in 2.5D NoC architecture","authors":"Chen Li, Sheng Ma, Lu Wang, Zicong Wang, Xia Zhao, Yang Guo","doi":"10.1109/ICCD.2016.7753352","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753352","url":null,"abstract":"As the 3D stacking technology still faces several challenges, the 2.5D stacking technology gains better application prospects nowadays. With the silicon interposer, the 2.5D stacking can improve the bandwidth and capacity of the memory system. To satisfy the communication requirements of the integrated memory system, the free routing resources in the interposer should be explored to implement an additional network. Yet, the performance is strongly limited by the unbalanced loads between the CPU-layer network and the interposer-layer network. In this paper, to address this issue, we propose a dynamic latency-aware load-balancing (DLL) strategy. Our key innovations are detecting congestion of the network layer via the average latency of recent packets and making the network layer selection at each source node. We leverage the free routing resources in the interposer to implement a latency propagation ring. With the ring, the latency information tracked at destination nodes is propagated back to source nodes. We achieve load-balance by using these information. Experimental results show that compared with the baseline design, a destination-detection strategy and a buffer-aware strategy, our DLL strategy achieves 45%, 14.9% and 6.5% of average throughput improvements with minor overheads.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"29 42","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132275549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Jyothi, Manasa Thoonoli, Richard Stern, R. Karri
{"title":"FPGA Trust Zone: Incorporating trust and reliability into FPGA designs","authors":"V. Jyothi, Manasa Thoonoli, Richard Stern, R. Karri","doi":"10.1109/ICCD.2016.7753346","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753346","url":null,"abstract":"This paper proposes a novel methodology FPGA Trust Zone (FTZ) to incorporate security into the design cycle to detect and isolate anomalies such as Hardware Trojans in the FPGA fabric. Anomalies are identified using violation to spatial correlation of process variation in FPGA fabric. Anomalies are isolated using Xilinx Isolation Design Flow (IDF) methodology. FTZ helps identify and partition the FPGA into areas that are devoid of anomalies and thus, assists to run designs securely and reliably even in an anomaly-infected FPGA. FTZ also assists IDF to select trustworthy areas for implementing isolated designs and trusted routes. We demonstrate the effectiveness of FTZ for AES and RC5 designs on Xilinx Virtex-7 and Atrix-7 FPGAs.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115331260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Liu, Hao Yang, Yong Li, Mengyao Xie, Lian Li, Chenggang Wu
{"title":"Memos: A full hierarchy hybrid memory management framework","authors":"Lei Liu, Hao Yang, Yong Li, Mengyao Xie, Lian Li, Chenggang Wu","doi":"10.1109/ICCD.2016.7753305","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753305","url":null,"abstract":"In this paper, we introduce memos, which integrates suitable memory management policies and schedules resources over the entire memory hierarchy in hybrid memory system. Powered by an OS kernel level monitoring tool, memos captures memory patterns online, and then leverages them to guide the memory page placement and data mapping. Experimental results show, on average, memos can benefit memory utilization, contributing to system throughput and QoS by 19.1% and 23.6%. Moreover, memos can reduce the NVM side memory latency by 3~83.3%, energy consumption by 25.1~99%, and benefit the NVM lifetime significantly (40× improvement on average).","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114725577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Strategies for optimal operating point selection in timing speculative processors","authors":"Omid Assare, Rajesh K. Gupta","doi":"10.1109/ICCD.2016.7753344","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753344","url":null,"abstract":"Performance of timing speculative processors relies on strategies for accurate prediction of optimal operating points. In this paper, we develop an efficient process-variation-aware simulation framework and use it to evaluate a range of such timing speculation strategies. Our experiments on a timing speculative processor running applications from the MiBench benchmark suite show that, in a typical case, while a perfect timing speculation strategy can improve throughput by up to 143% over a guardbanded design, the most commonly used approach in the literature achieves only a 21.8% of the potential gains. By improving the speculation accuracy, the new strategies we propose in this paper can realize up to 35.6% of the potential gains, a throughput improvement of 50.9% over a guardbanded design.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"577 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115896302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable memory architecture for soft-core processors","authors":"T. Jost, G. Nazar, L. Carro","doi":"10.1109/ICCD.2016.7753312","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753312","url":null,"abstract":"Restrictions over memory performance have always had a great impact on soft-core processors. The reduced number of ports on FPGAs' block RAMs may limit the exploitation of parallelism on soft-core processors that are implemented on top of these devices. Multiple memory ports on FPGAs are cumbersome and do not scale well, having a high cost in area and power consumption when implemented. In order to mitigate the impact of the memory bottleneck on such devices, we propose a scalable memory architecture for soft-cores. We make use of software-managed memories to build a memory system capable of improving performance and instruction-level parallelism (ILP) on soft-core processors. Results show that our architecture overcomes the limited parallelism realized on a dual-ported processor, reducing execution time by 16.5%. These improvements come with no area costs, as the processor is kept with the same total memory. Automated code transformations implemented within the LLVM compiler keep changes in application code to a minimum. We also show that our architecture scales better when boosting the number of functional units in the system.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116359002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BADGR: A practical GHR implementation for TAGE branch predictors","authors":"David J. Schlais, Mikko H. Lipasti","doi":"10.1109/ICCD.2016.7753338","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753338","url":null,"abstract":"In this work, we explore global history register (GHR) implementations for Tagged Geometric length (TAGE) style branch predictors with speculative updates. We break down the requirements to both update and recover TAGE predictors' history registers during normal operation and after mispeculation, discussing where various designs exhibit large checkpoint and/or operation overheads. To reduce these inefficiencies, we introduce BADGR, a novel GHR design for TAGE predictors that lowers power consumption and chip area over naive checkpointing techniques by 90% and 85%, respectively.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131496847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive and flexible key-value stores through soft data partitioning","authors":"B. Hong, Yongkee Kwon, Jung Ho Ahn, John Kim","doi":"10.1109/ICCD.2016.7753293","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753293","url":null,"abstract":"Key-value stores such as Memcached have become widely used by cloud and web-service providers. While there has been a significant amount of research done on improving the absolute performance of key-value stores, this work proposes an adaptive and a flexible approach to key-value stores. We first propose soft data partitioning that divides memory into multiple groups within a single node, or a single server process, to enable scale-up of key-value stores, while providing NUMA locality and an adaptive approach that can reduce overall request miss rate. The soft-partitioning enables a flexible Memcached server implementation in a NUMA system through NUMA-aware allocation as well as power-efficient NUMA server operation by migrating frequently accessed key-value pairs among the groups. We also propose an adaptive replacement policy within Memcached server that compares miss rates across the different memory groups to determine a more optimal replacement policy. To overcome the limitation of partitioning, we propose Group Auto-Balancing (GAB) where memory allocation from the different groups can be borrowed to minimize miss rate. Our results improve Memcached throughput by 12.9%, on average, over previously proposed MemC3 algorithm (up to 3.1× for write intensive workloads) while the adaptive replacement policy shows the lowest miss rate on adversarial access patterns.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132536547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}