{"title":"Underpowering NAND flash: Profits and perils","authors":"Hung-Wei Tseng, Laura M. Grupp, S. Swanson","doi":"10.1145/2463209.2488935","DOIUrl":"https://doi.org/10.1145/2463209.2488935","url":null,"abstract":"MLC Flash memory is getting more popular in computer systems ranging from sensor networks and embedded systems to large-scale server systems. However, MLC flash has many reliability concerns, including the potential for corruption due to supply voltage fluctuations. This paper characterizes MLC flash when the chip is underpowered (i.e., power fading and voltage droops). We demonstrate that underpowering flash can cause serious errors, but also help saving up to 45% of operation energy without incurring failure.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130090827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical decoding of double error correcting codes for high speed reliable memories","authors":"Zhen Wang","doi":"10.1145/2463209.2488861","DOIUrl":"https://doi.org/10.1145/2463209.2488861","url":null,"abstract":"As the technology moves into the nano-realm, traditional single-error-correcting, double-error-detecting (SEC-DED) codes are no longer sufficient for protecting memories against transient errors due to the increased multi-bit error rate. The well known double-error-correcting BCH codes and the classical decoding method for BCH codes based on Berlekamp-Massey algorithm and Chien search cannot be directly adopted to replace SEC-DED codes because of their much larger decoding latency. In this paper, we propose the hierarchical double-error-correcting (HDEC) code. The construction methods and the decoder architecture for the codes are described. The presented error correcting algorithm takes only 1 clock cycle to finish if no error or a single-bit error occurs. When there are multi-bit errors, the decoding latency is O(log2m) clock cycles for codes defined over GF(2m). This is much smaller than the latency for decoding BCH codes using Berlekamp Massey algorithm and Chien search, which is O(k) clock cycles - k is the number of information bits for the code and m ~ O(log2k). Synthesis results show that the proposed (79, 64) HDEC code requires only 80% of the area and consumes <; 70% of the power compared to the classical (78, 64) BCH code. For a large bit distortion rate (10-3 ~ 10-2), the average decoding latency for the (79, 64) HDEC code is only 36% ~ 60% of the latency for decoding the (78, 64) BCH code.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114776324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashutosh Kumar Singh, M. Shafique, Akash Kumar, J. Henkel
{"title":"Mapping on multi/many-core systems: Survey of current and emerging trends","authors":"Ashutosh Kumar Singh, M. Shafique, Akash Kumar, J. Henkel","doi":"10.1145/2463209.2488734","DOIUrl":"https://doi.org/10.1145/2463209.2488734","url":null,"abstract":"The reliance on multi/many-core systems to satisfy the high performance requirement of complex embedded software applications is increasing. This necessitates the need to realize efficient mapping methodologies for such complex computing platforms. This paper provides an extensive survey and categorization of state-of-the-art mapping methodologies and highlights the emerging trends for multi/many-core systems. The methodologies aim at optimizing system's resource usage, performance, power consumption, temperature distribution and reliability for varying application models. The methodologies perform design-time and run-time optimization for static and dynamic workload scenarios, respectively. These optimizations are necessary to fulfill the end-user demands. Comparison of the methodologies based on their optimization aim has been provided. The trend followed by the methodologies and open research challenges have also been discussed.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131821648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Banerjee, S. Devarakond, Shreyas Sen, A. Chatterjee
{"title":"Real-time use-aware adaptive MIMO RF receiver systems for energy efficiency under BER constraints","authors":"D. Banerjee, S. Devarakond, Shreyas Sen, A. Chatterjee","doi":"10.1145/2463209.2488802","DOIUrl":"https://doi.org/10.1145/2463209.2488802","url":null,"abstract":"Modern MIMO RF transceiver systems are designed to operate reliably under diverse channel conditions leading to incorporation of significant performance margins in RF transceiver systems. In general, across dynamically varying channel conditions, the fidelity of the RF front end devices can be traded-off against power consumption without compromising system-level BER limits. In this work such a real-time performance vs. power consumption modulation of RF front-end devices in MIMO systems is demonstrated. Through a multi-dimensional optimization technique, power-optimal configuration of the frontend for varying channel conditions are created. Additionally multiple low-power operating modes for the MIMO system are proposed depending on the performance metric (data rate or energy-per-bit) that need to be optimized for different applications.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133593649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the potential of 3D integration of inductive DC-DC converter for high-performance power delivery","authors":"S. Carlo, W. Yueh, S. Mukhopadhyay","doi":"10.1145/2463209.2488955","DOIUrl":"https://doi.org/10.1145/2463209.2488955","url":null,"abstract":"This paper studies the potential and challenges of integrating an inductor based DC-DC converter based voltage regulator module (VRM) as a separate die with processor for high-performance power delivery network (PD ). The frequency domain analysis of PD considering the converter shows 3D integration of VRM improves PD impedance but the effectiveness depends on the converter design and whether the LC filter is integrated on-board, on-package, or on-die with the die-stack. The methodologies to co-design the converter with PD and packaging scenarios are discussed and implications on PD impedance and power losses are studied to maximally exploit the advantage of 3D stacking.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130541132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LEQA: Latency estimation for a quantum algorithm mapped to a quantum circuit fabric","authors":"M. Dousti, Massoud Pedram","doi":"10.1145/2463209.2488786","DOIUrl":"https://doi.org/10.1145/2463209.2488786","url":null,"abstract":"This paper presents LEQA, a fast latency estimation tool for evaluating the performance of a quantum algorithm mapped to a quantum fabric. The actual quantum algorithm latency can be computed by performing detailed scheduling, placement and routing of the quantum instructions and qubits in a quantum operation dependency graph on a quantum circuit fabric. This is, however, a very expensive proposition that requires large amounts of processing time. Instead, LEQA, which is based on computing the neighborhood population counts of qubits, can produce estimates of the circuit latency with good accuracy (i.e., an average of less than 3% error) with up to two orders of magnitude speedup for mid-size benchmarks. This speedup is expected to increase superlinearly as a function of circuit size (operation count).","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132437542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple chip planning for chip-interposer codesign","authors":"Yuan-Kai Ho, Yao-Wen Chang","doi":"10.1145/2463209.2488767","DOIUrl":"https://doi.org/10.1145/2463209.2488767","url":null,"abstract":"An interposer-based three-dimensional integrated circuit, which introduces a silicon interposer as an interface between chips and a package, is one of the most promising integration technologies for modern and next-generation circuit designs. Inter-chip connections can be routed on the interposer by chip-scale wires to enhance design quality. However, its design complexity increases dramatically due to the extra interposer interface. Consequently, it is desirable to simultaneously consider the co-design of the interposer and multiple chips mounted on it. This paper addresses the first work of chip-interposer codesign to place multiple chips on an interposer to reduce inter-chip wirelength. For this problem, we propose a new hierarchical B*-tree to simultaneously place multiple chips, macros, and I/O Buffers. An approach based on bipartite matching is then proposed to concurrently assign signals from I/O buffers to micro bumps. Experimental results show that our approach is effective and efficient for the codesign problem.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125135436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving PUF security with regression-based distiller","authors":"C. Yin, G. Qu","doi":"10.1145/2463209.2488960","DOIUrl":"https://doi.org/10.1145/2463209.2488960","url":null,"abstract":"Silicon physical unclonable functions (PUF) utilize fabrication variation to extract information that will be unique for each chip. However, fabrication variation has a very strong spatial correlation and thus the PUF information will not be statistically random, which causes security threats to silicon PUF. We propose to decouple the unwanted systematic variation from the desired random variation through a regression-based distiller. In our experiments, we show that information generated by existing PUF schemes fail to pass NIST randomness test. However, our proposed method can provide statistically random PUF information and thus bolster the security characteristics of existing PUF schemes.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129073556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Tunnel-FET for ultra low power analog applications: A case study on operational transconductance amplifier","authors":"A. Trivedi, S. Carlo, S. Mukhopadhyay","doi":"10.1145/2463209.2488868","DOIUrl":"https://doi.org/10.1145/2463209.2488868","url":null,"abstract":"This work studies the potentials and challenges of designing ultra low-power analog circuits exploiting unique characteristics of Tunnel-FET (TFET). TFET can achieve ultra-low quiescent current (~pA). In the subthreshold operation, TFET exhibit subthreshold swing lower than 60mV/decade, and hence higher transconductance per bias current than the MOSFET. TFET also exhibit very weak temperature dependence, and higher output resistance. Among several challenges, TFET demonstrate higher Shot noise at low biasing current. Through design of TFET based Operational Transconductance Amplifier (OTA) these challenges and opportunities are discussed. For implantable bio-medical applications, TFET OTA based neural amplifier design is studied.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129645135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-power area-efficient large-scale ip lookup engine based on binary-weighted clustered networks","authors":"N. Onizawa, W. Gross","doi":"10.1145/2463209.2488801","DOIUrl":"https://doi.org/10.1145/2463209.2488801","url":null,"abstract":"We propose a novel architecture for low-power area-efficient large-scale IP lookup engines. The proposed architecture greatly increases memory efficiency by storing associations between IP addresses and their output rules instead of storing these data themselves. The rules can be determined by simple hardware using a few associations read from SRAMs, eliminating a power-hungry search of input addresses in TCAMs. The proposed hardware that stores 100,000 144-bit entries is evaluated under TSMC 65nm CMOS technology. The dynamic power dissipation and the area of the proposed hardware are 4.6% and 30.6% of a traditional TCAM, respectively while maintaining comparable throughput.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124480006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}