G. Singla, Gurinderjit Kaur, Ali K. Unver, Ümit Y. Ogras
{"title":"Predictive dynamic thermal and power management for heterogeneous mobile platforms","authors":"G. Singla, Gurinderjit Kaur, Ali K. Unver, Ümit Y. Ogras","doi":"10.7873/DATE.2015.1036","DOIUrl":"https://doi.org/10.7873/DATE.2015.1036","url":null,"abstract":"Heterogeneous multiprocessor systems-on-chip (MPSoCs) powering mobile platforms integrate multiple asymmetric CPU cores, a GPU, and many specialized processors. When the MPSoC operates close to its peak performance, power dissipation easily increases the temperature, hence adversely impacts reliability. Since using a fan is not a viable solution for hand-held devices, there is a strong need for dynamic thermal and power management (DTPM) algorithms that can regulate temperature with minimal performance impact. This paper presents a DTPM algorithm based on a practical temperature prediction methodology using system identification. The DTPM algorithm dynamically computes a power budget using the predicted temperature, and controls the types and number of active processors as well as their frequencies. Experiments on an octa-core big. LITTLE processor and common Android apps demonstrate that the proposed technique predicts temperature within 3% accuracy, while the DTPM algorithm provides around 6× reduction in temperature variance, and as large as 16% reduction in total platform power compared to using a fan.","PeriodicalId":162450,"journal":{"name":"2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124539143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Liaperdos, H. Stratigopoulos, L. Abdallah, Y. Tsiatouhas, A. Arapoyanni, Xin Li
{"title":"Fast deployment of alternate analog test using Bayesian model fusion","authors":"J. Liaperdos, H. Stratigopoulos, L. Abdallah, Y. Tsiatouhas, A. Arapoyanni, Xin Li","doi":"10.7873/DATE.2015.0102","DOIUrl":"https://doi.org/10.7873/DATE.2015.0102","url":null,"abstract":"In this paper, we address the problem of limited training sets for learning the regression functions in alternate analog test. Typically, a large volume of real data needs to be collected from different wafers and lots over a long period of time to be able to train the regression functions with accuracy across the whole design space and apply alternate test with high confidence. To avoid this delay and achieve a fast deployment of alternate test, we propose to use the Bayesian model fusion technique that leverages prior knowledge from simulation data and fuses this information with data from few real circuits to draw accurate regression functions across the whole design space. The technique is demonstrated for an alternate test designed for RF low noise amplifiers.","PeriodicalId":162450,"journal":{"name":"2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"234 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124579905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Weis, Matthias Jung, Peter Ehses, C. Santos, P. Vivet, Sven Goossens, Martijn Koedam, N. Wehn
{"title":"Retention time measurements and modelling of bit error rates of WIDE I/O DRAM in MPSoCs","authors":"C. Weis, Matthias Jung, Peter Ehses, C. Santos, P. Vivet, Sven Goossens, Martijn Koedam, N. Wehn","doi":"10.7873/DATE.2015.0258","DOIUrl":"https://doi.org/10.7873/DATE.2015.0258","url":null,"abstract":"DRAM cells use capacitors as volatile and leaky bit storage elements. The time spent without refreshing them is called retention time. It is well known that the retention time depends inverse exponentially on the temperature. In 3D stacking, the challenges of high power densities and thermal dissipation are exacerbated and have a much stronger impact on the retention time of 3D-stacked WIDE I/O DRAMs that are placed on top of an MPSoC. Consequently, it is very important to study the temperature behaviour of WIDE I/O DRAMs. To the best of our knowledge, no investigations based on real measurements were done for stacked DRAM-on-logic devices. In this paper, we first provide detailed measurements on temperature-dependent retention time and bit error rates of WIDE I/O DRAMs. To obtain the correct temperature distribution of the WIDE-I/O DRAM die we use an advanced thermal modelling tool: the DOCEA AceThermalModelerTM (ATM). The WIDE I/O DRAM retention times and bit error rates are compared to the behaviour of 2D-DRAM chips (DIMMs) with the help of an advanced FPGA-based test system. We observed data pattern dependencies and variable retention times (VRTs). Second, based on this data, we develop and validate a SystemC-TLM2.0 DRAM bit error rate model. Our proposed DRAM bit error model enables early investigations on the temperature vs. retention time trade-off in future 3D-stacked MPSoCs with WIDE I/O DRAMs in SystemC-TLM2.0 environments.","PeriodicalId":162450,"journal":{"name":"2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128535520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shahzad Muzaffar, Jerald Yoo, A. Shabra, I. Elfadel
{"title":"A pulsed-index technique for single-channel, low-power, dynamic signaling","authors":"Shahzad Muzaffar, Jerald Yoo, A. Shabra, I. Elfadel","doi":"10.7873/DATE.2015.1070","DOIUrl":"https://doi.org/10.7873/DATE.2015.1070","url":null,"abstract":"The most common operation of an IoT sensor is that of short activity bursts separated by long time intervals in sleep or listen modes. During the data bursts, sensed information has to be reliably communicated in real time without draining the energy resources of the sensor node. One way to save such resources is to efficiently code the data burst, use single-channel communication, and adopt ultra-low-power communication circuit techniques. Clock-data recovery (CDR) circuits are typically significant consumers of energy on traditional singlechannel communication protocols. In this paper, we present a novel single-channel protocol that does not require any CDR circuitry. The protocol is based on the novel concept of a pulsed index where data is encoded to minimize the number of ON bits, move them to the LSB end of the packet, and transmit the ON bit indices in the form of a pulse stream. The pulse count is equal to the index of the ON bit. We call this protocol Pulsed Index Communication (PIC). Beside the elimination of CDR, we show that the implementation of PIC is very area-efficient, low-power and highly tolerant of clocking differences between transmitter and receiver. We present both an FPGA and an ASIC implementation of the protocol and use them to illustrate the performance, reliability and power consumption features of PIC signaling. In particular, we show that for an ASIC implementation on 65nm technology, PIC can reduce area by more than 80% and power by more than 70% in comparison with a CDR-based serial bit transfer protocol.","PeriodicalId":162450,"journal":{"name":"2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115866222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charalampos Antoniadis, G. Karakonstantis, N. Evmorfopoulos, A. Burg, G. Stamoulis
{"title":"On the statistical memory architecture exploration and optimization","authors":"Charalampos Antoniadis, G. Karakonstantis, N. Evmorfopoulos, A. Burg, G. Stamoulis","doi":"10.7873/DATE.2015.0839","DOIUrl":"https://doi.org/10.7873/DATE.2015.0839","url":null,"abstract":"The worsening of process variations and the consequent increased spreads in circuit performance and consumed power hinder the satisfaction of the targeted budgets and lead to yield loss. Corner based design and adoption of design guardbands might limit the yield loss. However, in many cases such methods may not be able to capture the real effects which might be way better than the predicted ones leading to increasingly pessimistic designs. The situation is even more severe in memories which consist of substantially different individual building blocks, further complicating the accurate analysis of the impact of variations at the architecture level leaving many potential issues uncovered and opportunities unexploited. In this paper, we develop a framework for capturing non-trivial statistical interactions among all the components of a memory/cache. The developed tool is able to find the optimum memory/cache configuration under various constraints allowing the designers to make the right choices early in the design cycle and consequently improve performance, energy, and especially yield. Our, results indicate that the consideration of the architectural interactions between the memory components allow to relax the pessimistic access times that are predicted by existing techniques.","PeriodicalId":162450,"journal":{"name":"2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116666002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of multi-purpose cores of Keccak and AES","authors":"P. Yalla, Ekawat Homsirikamol, J. Kaps","doi":"10.7873/DATE.2015.0834","DOIUrl":"https://doi.org/10.7873/DATE.2015.0834","url":null,"abstract":"Most widely used security protocols, Internet Protocol Security (IPSec), Secure Socket Layer (SSL), and Transport Layer Security (TLS), provide several cryptographic services which in turn require multiple dedicated cryptographic algorithms. A single cryptographic primitive for all secret key functions utilizing different mode of operations can overcome this constraint. This paper investigates the possibility of using AES and Keccak as the underlying primitives for high-speed and resource constrained applications. Even though a plain AES implementation is typically much smaller and has a better throughput to area ratio than a plain Keccak, adding additional cryptographic services changes the results dramatically. Our multi-purpose Keccak outperforms our multi-purpose AES by a factor of 4 for throughput over area on average. This underlines the flexibility of the Keccak Sponge and Duplex functions. Our multi-purpose Keccak achieves a throughput of 23.2 Gbps in AE-mode (Keyak) on a Xilinx Virtex-7 and 28.7 Gbps on a Altera Stratix-IV. In order to study this further we also implemented two versions of a dedicated Keyak and dedicated AES-GCM. Our dedicated Keyak implementation outperforms our dedicated AES-GCM on average by a factor 6 in terms of throughput over area reaching a throughput of 28.9 Gbps and 4.1 Gbps respectively on a Xilinx Virtex-7.","PeriodicalId":162450,"journal":{"name":"2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117255802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hardware implementation of a radial basis function neural network using stochastic logic","authors":"Yuan Ji, F. Ran, Cong Ma, D. Lilja","doi":"10.7873/DATE.2015.0377","DOIUrl":"https://doi.org/10.7873/DATE.2015.0377","url":null,"abstract":"Hardware implementations of artificial neural networks typically require significant amounts of hardware resources. This paper proposes a novel radial basis function artificial neural network using stochastic computing elements, which greatly reduces the required hardware. The Gaussian function used for the radial basis function is implemented with a two-dimensional finite state machine. The norm between the input data and the center point is optimized using simple logic gates. Results from two pattern recognition case studies, the standard Iris flower and the MICR font benchmarks, show that the difference of the average mean squared error between the proposed stochastic network and the corresponding traditional deterministic network is only 1.3% when the stochastic stream length is 10kbits. The accuracy of the recognition rate varies depending on the stream length, which gives the designer tremendous flexibility to tradeoff speed, power, and accuracy. From the FPGA implementation results, the hardware resource requirement of the proposed stochastic hidden neuron is only a few percent of the hardware requirement of the corresponding deterministic hidden neuron. The proposed stochastic network can be expanded to larger scale networks for complex tasks with simple hardware architectures.","PeriodicalId":162450,"journal":{"name":"2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"123 26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115747932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniele Bortolotti, Mauro Mangia, Andrea Bartolini, R. Rovatti, G. Setti, L. Benini
{"title":"An ultra-low power dual-mode ECG monitor for healthcare and wellness","authors":"Daniele Bortolotti, Mauro Mangia, Andrea Bartolini, R. Rovatti, G. Setti, L. Benini","doi":"10.7873/DATE.2015.0784","DOIUrl":"https://doi.org/10.7873/DATE.2015.0784","url":null,"abstract":"Technology scaling enables today the design of ultra-low cost wireless body sensor networks for wearable biomedical monitors. These devices, according to the application domain, show greatly varying tradeoffs in terms of energy consumption, resources utilization and reconstructed biosignal quality. To achieve minimal energy operation and extend battery life, several aspects must be considered, ranging from signal processing to the technological layers of the architecture. The recently proposed Rakeness-based Compressed Sensing (CS) expands the standard CS paradigm deploying the localization of input signal energy to further increase data compression without sensible RSNR degradation. This improvement can be used either to optimize the usage of a non volatile memory (NVM) to store in the device a record of the biosignal or to minimize the energy consumption for the transmission of the entire signal as well as some of its features. We specialize the sensing stage to achieve signal qualities suitable for both Healthcare (HC) and Wellness (WN), according to an external input (e.g. the patient). In this paper we envision a dual-operation wearable ECG monitor, considering a multi-core DSP for input biosignal compression and different technologies for either transmission or local storage. The experimental results show the effectiveness of the Rakeness approach (up to ≈ 70% more energy efficient than the baseline) and evaluate the energy gains considering different use case scenarios.","PeriodicalId":162450,"journal":{"name":"2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127441410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A coupling area reduction technique applying ODC shifting","authors":"Yi Diao, T. Lam, Xing Wei, Yu-Liang Wu","doi":"10.7873/DATE.2015.0768","DOIUrl":"https://doi.org/10.7873/DATE.2015.0768","url":null,"abstract":"Circuit size reduction is a basic problem in today's integrated circuit (IC) design. Besides yielding a smaller area, reducing circuit size can also provide advantages in many operations throughout the design flow, including technology mapping, verification and place-and-route. In recent years, some node based logic synthesis algorithms have been proposed for this purpose. Node Addition and Removal (NAR) and Observability Don't Cares (ODCs) based node merging were found to be quite effective in reducing the number of nodes in a netlist. However, both methods do not address the effect of re-distributing ODCs and the results are virtually fixed after one iteration run. We study the implications of redistributing ODCs and propose a node-based and wire-based coupling synthesis scheme that can effectively And better solutions with the application of ODC shifting operations. Experimental results show that this approach can produce area reductions nearly double of the pure node-based algorithms.","PeriodicalId":162450,"journal":{"name":"2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125383175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Kroening, Lihao Liang, T. Melham, P. Schrammel, Michael Tautschnig
{"title":"Effective verification of low-level software with nested interrupts","authors":"D. Kroening, Lihao Liang, T. Melham, P. Schrammel, Michael Tautschnig","doi":"10.7873/DATE.2015.0360","DOIUrl":"https://doi.org/10.7873/DATE.2015.0360","url":null,"abstract":"Interrupt-driven software is difficult to test and debug, especially when interrupts can be nested and subject to priorities. Interrupts can arrive at arbitrary times, leading to an explosion in the number of cases to be considered. We present a new formal approach to verifying interrupt-driven software based on symbolic execution. The approach leverages recent advances in the encoding of the execution traces of interacting, concurrent threads. We assess the performance of our method on benchmarks drawn from embedded systems code and device drivers, and experimentally compare it to conventional formal approaches that use source-to-source transformations. Our experimental results show that our method significantly outperforms conventional techniques. To the best of our knowledge, our technique is the first to demonstrate effective formal verification of low-level embedded software with nested interrupts.","PeriodicalId":162450,"journal":{"name":"2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127024250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}