{"title":"NUMANA: A hybrid numerical and analytical thermal simulator for 3-D ICs","authors":"Yu-Min Lee, T. Wu, Pei-Yu Huang, C. Yang","doi":"10.7873/DATE.2013.282","DOIUrl":"https://doi.org/10.7873/DATE.2013.282","url":null,"abstract":"By combining analytical and numerical simulation techniques, this work develops a hybrid thermal simulator, NUMANA, which can effectively deal with complicated material structures, to estimate the temperature profile of a 3-D IC. Compared with a commercial tool, ANSYS, its maximum relative error is only 1.84%. Compared with a well known linear system solver, SuperLU [1], it can achieve orders of magnitude speedup.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"23 1","pages":"1379-1384"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80567610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A satisfiability approach to speed assignment for distributed real-time systems","authors":"Pratyush Kumar, Devesh B. Chokshi, L. Thiele","doi":"10.7873/DATE.2013.160","DOIUrl":"https://doi.org/10.7873/DATE.2013.160","url":null,"abstract":"We study the problem of assigning speeds to resources serving distributed applications with delay, buffer and energy constraints. We argue that the considered problem does not have any straightforward solution due to the intricately related constraints. As a solution, we propose using Real-Time Calculus (RTC) to analyse the constraints and a SATisfiability solver to efficiently explore the design space. To this end, we develop an SMT solver by using the OpenSMT framework and the Modular Performance Analysis (MPA) toolbox. Two key enablers for this implementation are the analysis of incomplete models and generation of conflict clauses in RTC. The results on problem instances with very large decision spaces indicate that the proposed SMT solver performs very well in practice.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"72 1","pages":"749-754"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80575427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and implementation of a group-based RO PUF","authors":"C. Yin, G. Qu, Qiang Zhou","doi":"10.7873/DATE.2013.094","DOIUrl":"https://doi.org/10.7873/DATE.2013.094","url":null,"abstract":"The silicon physical unclonable functions (PUF) utilize the uncontrollable variations during integrated circuit (IC) fabrication process to facilitate security related applications such as IC authentication. In this paper, we describe a new framework to generate secure PUF secret from ring oscillator (RO) PUF with improved hardware efficiency. Our work is based on the recently proposed group-based RO PUF with the following novel concepts: an entropy distiller to filter the systematic variation; a simplified grouping algorithm to partition the ROs into groups; a new syndrome coding scheme to facilitate error correction; and an entropy packing method to enhance coding efficiency and security. Using RO PUF dataset available in the public domain, we demonstrate these concepts can create PUF secret that can pass the NIST randomness and stability tests. Compared to other state-of-the-art RO PUF design, our approach can generate an average of 72% more PUF secret with the same amount of hardware.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"7 1","pages":"416-421"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79224777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reachability analysis of nonlinear analog circuits through iterative reachable set reduction","authors":"S. Ahmadyan, Shobha Vasudevan","doi":"10.7873/DATE.2013.293","DOIUrl":"https://doi.org/10.7873/DATE.2013.293","url":null,"abstract":"We propose a methodology for reachability analysis of nonlinear analog circuits to verify safety properties. Our iterative reachable set reduction algorithm initially considers the entire state space as reachable. Our algorithm iteratively determines which regions in the state space are unreachable and removes those unreachable regions from the over approximated reachable set. We use the State Partitioning Tree (SPT) algorithm to recursively partition the reachable set into convex polytopes. We determine the reachability of adjacent neighbor polytopes by analyzing the direction of state space trajectories at the common faces between two adjacent polytopes. We model the direction of the trajectories as a reachability decision function that we solve using a sound root counting method. We are faithful to the nonlinearities of the system. We demonstrate the memory efficiency of our algorithm through computation of the reachable set of Van der Pol oscillation circuit.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"38 1","pages":"1436-1441"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88739279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Hu, Qingfeng Zhuge, C. Xue, Wei-Che Tseng, E. Sha
{"title":"Software enabled wear-leveling for hybrid PCM main memory on embedded systems","authors":"J. Hu, Qingfeng Zhuge, C. Xue, Wei-Che Tseng, E. Sha","doi":"10.7873/DATE.2013.131","DOIUrl":"https://doi.org/10.7873/DATE.2013.131","url":null,"abstract":"Phase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics. However, relatively low endurance has limited its practical applications. In this paper, in additional to existing hardware level optimizations, we propose software enabled wear-leveling techniques to further extend PCM's lifetime when it is adopted in embedded systems. A polynomial-time algorithm, the Software Wear-Leveling (SWL) algorithm, is proposed in this paper to achieve wear-leveling without hardware overhead. According to the experimental results, the proposed technique can reduce the number of writes on the most-written bits by more than 80% when compared with a greedy algorithm, and by around 60% when compared with the existing Optimal Data Allocation (ODA) algorithm with under 6% memory access overhead.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"45 1","pages":"599-602"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80652514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Co-synthesis of data paths and clock control paths for minimum-period clock gating","authors":"Wen-Pin Tu, Shih-Hsu Huang, Chun-Hua Cheng","doi":"10.7873/DATE.2013.366","DOIUrl":"https://doi.org/10.7873/DATE.2013.366","url":null,"abstract":"Although intentional clock skew can be utilized to reduce the clock period, its application in gated clock designs has not been well studied. A gated clock design includes both data paths and clock control paths, but conventional clock skew scheduling only focus on data paths. Based on that observation, in this paper, we propose an approach to perform the co-synthesis of data paths and clock control paths in a nonzero skew gated clock design. Our objective is to minimize the required inserted delay for working with the lower bound of the clock period (under clocking constraints of both data paths and clock control paths). Different from previous works, our approach can guarantee no clocking constraint violation in the presence of clock gating. Experimental results show our approach can effectively enhance the circuit speed with almost no penalty on the power consumption.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"22 1","pages":"1831-1836"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85269981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient cache architectures for reliable hybrid voltage operation using EDC codes","authors":"Bojan Maric, J. Abella, M. Valero","doi":"10.7873/DATE.2013.193","DOIUrl":"https://doi.org/10.7873/DATE.2013.193","url":null,"abstract":"Semiconductor technology evolution enables the design of sensor-based battery-powered ultra-low-cost chips (e.g., below 1 €) required for new market segments such as body, urban life and environment monitoring. Caches have been shown to be the highest energy and area consumer in those chips. This paper proposes a novel, hybrid-operation (high Vcc, ultra-low Vcc), single-Vcc domain cache architecture based on replacing energy-hungry bitcells (e.g., 10T) by more energy-efficient and smaller cells (e.g., 8T) enhanced with Error Detection and Correction (EDC) features for high reliability and performance predictability. Our architecture is proven to largely outperform existing solutions in terms of energy and area.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"48 1","pages":"917-920"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89659450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiansheng Zhang, A. Cevrero, Giulia Beanato, Panagiotis Athanasopoulos, A. Coskun, Y. Leblebici
{"title":"3D-MMC: A modular 3D multi-core architecture with efficient resource pooling","authors":"Tiansheng Zhang, A. Cevrero, Giulia Beanato, Panagiotis Athanasopoulos, A. Coskun, Y. Leblebici","doi":"10.7873/DATE.2013.257","DOIUrl":"https://doi.org/10.7873/DATE.2013.257","url":null,"abstract":"This paper demonstrates a fully functional hardware and software design for a 3D stacked multi-core system for the first time. Our 3D system is a low-power 3D Modular Multi-Core (3D-MMC) architecture built by vertically stacking identical layers. Each layer consists of cores, private and shared memory units, and communication infrastructures. The system uses shared memory communication and Through-Silicon-Vias (TSVs) to transfer data across layers. A serialization scheme is employed for inter-layer communication to minimize the overall number of TSVs. The proposed architecture has been implemented in HDL and verified on a test chip targeting an operating frequency of 400MHz with a vertical bandwidth of 3.2Gbps. The paper first evaluates the performance, power and temperature characteristics of the architecture using a set of software applications we have designed. We demonstrate quantitatively that the proposed modular 3D design improves upon the cost and performance bottlenecks of traditional 2D multi-core design. In addition, a novel resource pooling approach is introduced to efficiently manage the shared memory of the 3D stacked system. Our approach reduces the application execution time significantly compared to 2D and 3D systems with conventional memory sharing.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"38 1","pages":"1241-1246"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90200485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto A. Del Barrio, R. Hermida, S. Memik, J. Mendias, M. Molina
{"title":"Multispeculative additive trees in High-Level Synthesis","authors":"Alberto A. Del Barrio, R. Hermida, S. Memik, J. Mendias, M. Molina","doi":"10.7873/DATE.2013.052","DOIUrl":"https://doi.org/10.7873/DATE.2013.052","url":null,"abstract":"Multispeculative Functional Units (MSFUs) are arithmetic functional units that operate using several predictors for the carry signal. The carry prediction helps to shorten the critical path of the functional unit. The average performance of these units is determined by the hit rate of the prediction. In spite of utilizing more than one predictor, none or only one additional cycle is enough for producing the correct result in the majority of the cases. In this paper we present multispeculation as a way of increasing the performance of tree structures with a negligible area penalty. By judiciously introducing these structures into computation trees, it will only be necessary to predict in certain selected nodes, thus minimizing the number of operations that can potentially mispredict. Hence, the average latency will be diminished and thus performance will be increased. Our experiments show that it is possible to improve on average 24% and 38% execution time, when considering logarithmic and linear modules, respectively.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"91 1","pages":"188-193"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73527796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Coppola, B. Falsafi, J. Goodacre, Georgios Kornaros
{"title":"From embedded multi-core SoCs to scale-out processors","authors":"M. Coppola, B. Falsafi, J. Goodacre, Georgios Kornaros","doi":"10.7873/DATE.2013.199","DOIUrl":"https://doi.org/10.7873/DATE.2013.199","url":null,"abstract":"Information technology is now an indispensable pillar of a modern day society. CMOS technologies, which lay the foundation for all digital platforms, however, are experiencing a major inflection point due to a slowdown in voltage scaling. The net result is that power is emerging as the key design constraint for all platforms from embedded systems to datacenters. This tutorial presents emerging design paradigms from embedded multicore SoCs to server processors for scale-out datacenters based on mobile cores.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"448 1","pages":"947-951"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76612829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}