{"title":"Power-aware multi-voltage custom memory models for enhancing RTL and low power verification","authors":"V. K. Kalyanam, M. Saint-Laurent, J. Abraham","doi":"10.1109/ICCD.2015.7357080","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357080","url":null,"abstract":"We describe a methodology to model the low power and voltage behavior of multi-voltage custom memories in processors. These models facilitate early power-aware verification by abstracting the transistor-level representation of the memory to its power-aware behavioral RTL model. To the best of our knowledge, this is the first attempt at addressing the power-aware RTL model generation problem for custom memories. In our method, we identify voltage crossing points in transistors across channel connected components and use these crossing points to transform the RTL for power-awareness closely matching its circuit implementation. Without the proposed abstraction technique to generate power-aware RTL, low-power verification of such memories will need to be done using transistor-level simulations that are prohibitively time-intensive and hence impractical. We check for correctness of these generated power-aware memory models through formal equivalence, symbolic simulations, assertion and simulation based verification. These models are also validated using static power-domain checks. By applying this methodology in a power-aware design and verification framework on a commercial processor, we identified and corrected low power circuit and RTL bugs prior to tape-out.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115445494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seth H. Pugsley, Arjun Deb, R. Balasubramonian, Feifei Li
{"title":"Fixed-function hardware sorting accelerators for near data MapReduce execution","authors":"Seth H. Pugsley, Arjun Deb, R. Balasubramonian, Feifei Li","doi":"10.1109/ICCD.2015.7357143","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357143","url":null,"abstract":"A large fraction of MapReduce execution time is spent processing the Map phase, and a large fraction of Map phase execution time is spent sorting the intermediate key-value pairs generated by the Map function. Sorting accelerators can achieve high performance and low power because they lack the overheads of sorting implementations on general purpose hardware, such as instruction fetch and decode. We find that sorting accelerators are a good match for 3D-stacked Near Data Processing (NDP) because their sorting throughput is so high that it saturates the memory bandwidth available in other memory organizations. The increased sorting performance and low power requirement of fixed-function hardware lead to very high Map phase performance and energy efficiency, reducing Map phase execution time by up to 92%, and reducing energy consumption by up to 91%. We further find that sorting accelerators in a less exotic form of NDP outperform more expensive forms of 3D-stacked NDP without accelerators. We also implement the accelerator on an FPGA to validate our claims.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115663486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ItHELPS: Iterative high-accuracy error localization in post-silicon","authors":"V. Bertacco, Wade Bonkowski","doi":"10.1109/ICCD.2015.7357103","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357103","url":null,"abstract":"The increasing complexity of modern digital circuits has exacerbated the challenge of verifying the functionality of these systems. To further compound the issue, shrinking time-to-market constraints place increased pressure on attaining correct devices in short amounts of time. As a result, more and more of the burden of validation has shifted to the post-silicon stage, when the first silicon prototypes of a design become available. This validation phase brings much faster test execution speeds, at the cost of a very limited ability of diagnosing bugs. To further compound the problem, intermittent failures are not uncommon, due to the physical nature of the device under validation. In this work we propose ItHELPS, a solution to identify the timing of a bug manifestation and the root signals responsible for it in industry-size complex digital designs. We employ a synergistic approach based on a machine-learning solution (DBSCAN) paired with an adaptive refinement analysis, capable of narrowing the location of a failure down to a handful of signals, possibly buried deep within the design hierarchy. We find experimentally that our approach outperforms the accuracy of prior state-of-the-art solutions by two orders of magnitude.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"220 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124365288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Min, Kapil Batra, Yusuke Yachide, Jorgen Peddersen, S. Parameswaran
{"title":"RAPITIMATE: Rapid performance estimation of pipelined processing systems containing shared memory","authors":"S. Min, Kapil Batra, Yusuke Yachide, Jorgen Peddersen, S. Parameswaran","doi":"10.1109/ICCD.2015.7357175","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357175","url":null,"abstract":"A pipeline of processors can increase the throughput of streaming applications significantly. Communication between processors in such a system can occur via FIFOs, shared memory or both. The use of a cache for the shared memory can improve performance. To see the effect of differing cache configurations (size, line size and associativity) on performance, typical full system simulations for each differing cache configuration must be performed. Rapid estimation of performance is difficult due to the cache being accessed by many processors. In this paper, for the first time, we show a method to estimate the performance of a pipelined processor system in the presence of differing sizes of caches which connect to the main memory. By performing just a few full simulations for a few cache configurations, and by using these simulations to estimate the hits and misses for other configurations, and then by carefully annotating the times of traces by the estimated hits and misses, we are able to estimate the throughput of a pipelined system to within 90% of its actual value. The estimation time takes less than 10% of full simulation time. The estimated values have a fidelity of 0.97 on average (1 being perfectly correlated) with the actual values.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122163680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault-tolerant in-memory crossbar computing using quantified constraint solving","authors":"Alvaro Velasquez, Sumit Kumar Jha","doi":"10.1109/ICCD.2015.7357090","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357090","url":null,"abstract":"There has been a surge of interest in the effective storage and computation of data using nanoscale crossbars. In this paper, we present a new method for automating the design of fault-tolerant crossbars that can effectively compute Boolean formula. Our approach leverages recent advances in Satisfiability Modulo Theories (SMT) solving for quantified bit-vector formula (QBVF). We demonstrate that our method is well-suited for fault-tolerant computation and can perform Boolean computations despite stuck-open and stuck-closed interconnect defects as well as wire faults. We employ our framework to generate various arithmetic and logical circuits that compute correctly despite the presence of stuck-at faults as well as broken wires.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"141 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124882979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Physical synthesis of DNA circuits with spatially localized gates","authors":"Jinwook Jung, Daijoon Hyun, Youngsoo Shin","doi":"10.1109/ICCD.2015.7357112","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357112","url":null,"abstract":"With the current DNA nanotechnology, we are now able to arrange DNA molecules on a DNA origami to compose a logic gate. This in turn realizes a spatially localized DNA circuit, on which the logic gates are placed on the specific locations as in electronic circuits. In this paper, we address three key problems in designing large-scale spatially localized DNA circuits. An AND gate, made of four hairpins, functions in stochastic manner and sometimes outputs a wrong result. Given tolerable error probability at each circuit output, we address how the probability that each AND gate functions correctly can be determined, which in turn determines the location of constituent hairpins. In the second problem, we study how hairpins are arranged on a DNA origami to minimize the area of a whole circuit, which determines the area of the origami board. The third problem regards the DNA domain assignment so that connected gates can communicate without interference.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127112153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Packet Field Extraction Engine (DPFEE): A pre-processor for network intrusion detection and denial-of-service detection systems","authors":"V. Jyothi, Sateesh Addepalli, R. Karri","doi":"10.1109/ICCD.2015.7357113","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357113","url":null,"abstract":"Network Intrusion Detection Systems (NIDS) and Anti-Denial-of-Service (DoS) employ Deep Packet Inspection (DPI) which provides visibility to the content of payload to detect network attacks. All DPI engines assume a pre-processing step that extracts the various protocol specific fields. However, application layer (L7) field extraction is computationally expensive. We propose a Deep Packet Field Extraction Engine (DPFEE) to offload the application layer field extraction to hardware. DPFEE is a content-aware, grammar-based, Layer 7 programmable field extraction engine for text-based protocols. Our prototype DPFEE implementation for the Session Initiation Protocol (SIP) on a single FPGA, achieved a bandwidth of 257.1 Gbps and this can be easily scaled beyond 300 Gbps.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"30 14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125173231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimized local control strategy for voice-based interaction-tracking badges for social applications","authors":"Xiaowei Liu, A. Doboli, Fan Ye","doi":"10.1109/ICCD.2015.7357182","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357182","url":null,"abstract":"This paper presents a method to design optimized local control strategies for Cyber-Physical Systems that produce reliable data models for social applications. Data models have different semantics and abstraction levels. The local control strategies manage ad-hoc nano-clouds of embedded computing and communication nodes (CCNs) used for data collection, modeling, and communication. Control strategies consider tradeoffs defined by the resource constraints of embedded CCNs (e.g., computing power, communication bandwidth, and energy), assurance requirements (e.g., robustness) of the models, and privacy of users. Experiments evaluate and demonstrate the effectiveness of the control strategies for nano-clouds composed of smart voice-based interaction-tracking badges.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131462320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring well configurations for voltage level converter design in 28 nm UTBB FDSOI technology","authors":"P. Corsonello, S. Perri, F. Frustaci","doi":"10.1109/ICCD.2015.7357157","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357157","url":null,"abstract":"Voltage level converters are critical components in multi supply ultra-low voltage designs, especially when signals need to be converted from the sub-threshold to the above-threshold domain. In these designs, advanced technology processes, such as the Ultra-Thin Body and Buried oxide (UTBB) Fully-Depleted SOI (FDSOI), are greatly desired since they intrinsically allow controlling the Drain Induced Barrier Lowering effect (DIBL) and the Gate Induced Drain Leakage (GIDL), in addition to the reduction of the effects of process variations. Moreover, these technologies provide a group of architectural and device-level techniques for threshold voltage adjustment that can be efficiently adopted to combine high performances and low energy consumption. However, specific design strategies should be applied to efficiently exploit all these potentialities. This paper investigates how the physical design of level converters can benefit from the synergistic adoption of the knobs available in the UTBB FDSOI technology (poly biasing, flip-well, single-well, back biasing). In particular, three mixed single well configurations have been implemented and analyzed. This research work demonstrates that the specific selected approach allows decreasing the energy per cycle consumption, the leakage current and the delay by up to 35.3%, 70.4%, and 6.2%, respectively, with respect to the basic conventional design strategy. Furthermore, statistical analysis confirmed that these advantages are maintained for a wide range of process variations, also improving the functional yield and the minimum input voltage causing the level converter failure.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127581280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Kanduri, M. Haghbayan, A. Rahmani, P. Liljeberg, A. Jantsch, H. Tenhunen
{"title":"Dark silicon aware runtime mapping for many-core systems: A patterning approach","authors":"A. Kanduri, M. Haghbayan, A. Rahmani, P. Liljeberg, A. Jantsch, H. Tenhunen","doi":"10.1109/ICCD.2015.7357167","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357167","url":null,"abstract":"Limitation on power budget in many-core systems leaves a fraction of on-chip resources inactive, referred to as dark silicon. In such systems, an efficient run-time application mapping approach can considerably enhance resource utilization and mitigate the dark silicon phenomenon. In this paper, we propose a dark silicon aware runtime application mapping approach that patterns active cores alongside the inactive cores in order to evenly distribute power density across the chip. This approach leverages dark silicon to balance the temperature of active cores to provide higher power budget and better resource utilization, within a safe peak operating temperature. In contrast with exhaustive search based mapping approach, our agile heuristic approach has a negligible runtime overhead. Our patterning strategy yields a surplus power budget of up to 17% along with an improved throughput of up to 21% in comparison with other state-of-the-art run-time mapping strategies, while the surplus budget is as high as 40% compared to worst case scenarios.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127605585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}