I. Latifis, Karthick Parashar, G. Dimitroulakos, H. Cappelle, Christakis Lezos, K. Masselos, F. Catthoor
{"title":"Matlab to C compilation targeting Application Specific Instruction Set Processors","authors":"I. Latifis, Karthick Parashar, G. Dimitroulakos, H. Cappelle, Christakis Lezos, K. Masselos, F. Catthoor","doi":"10.3850/9783981537079_0426","DOIUrl":"https://doi.org/10.3850/9783981537079_0426","url":null,"abstract":"This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C code in which the processor's special instructions are represented via specialized intrinsic functions. By doing this the generated code can be used as input to any C/C++ compiler. Thus the proposed compiler allows the description of the specialized instruction set of the target processor in a parameterized way allowing the support of any processor. The proposed compiler has been used for the generation of application code for an ASIP targeting DSP applications. The code generated by the proposed compiler achieves a speed up between 2x-30x on the targeted ASIP for six DSP benchmarks compared to the code generated by Mathworks MATLAB to C compiler. Thus the proposed compiler can be employed to reduce the development time/effort/cost and time to market by raising the abstraction of application design in an embedded systems / system-on-chip development context while still improving implementation efficiency.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126112484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long Cheng, Kai Huang, Gang Chen, Biao Hu, A. Knoll
{"title":"Minimizing peak temperature for pipelined hard real-time systems","authors":"Long Cheng, Kai Huang, Gang Chen, Biao Hu, A. Knoll","doi":"10.3850/9783981537079_0556","DOIUrl":"https://doi.org/10.3850/9783981537079_0556","url":null,"abstract":"This paper addresses the problem of minimizing the peak temperature for pipelined multi-core systems under hard end-to-end deadline constraints by adversely using the Pay-Burst-Only-Once principle. The Periodic Thermal Management is adopted to control the temperature and every core is periodically switched between two power modes. With the peak temperature representation, we first formulate the problem of finding the thermal optimal periodic schemes which satisfies deadline constraints and then present a fast heuristic algorithm to solve it. Adopting real life processor platforms and applications, our simulation demonstrates that our approach reduces the peak temperature by up to 15°C on the 4-stage ARM platform compared to sub-deadline partition approach. Moreover, our algorithm is shown to be scalable w.r.t. the number of pipelined stages and its effectiveness is validated by the brutally searching approach.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129465740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Captopril: Reducing the pressure of bit flips on hot locations in non-volatile main memories","authors":"Majid Jalili, H. Sarbazi-Azad","doi":"10.3850/9783981537079_0032","DOIUrl":"https://doi.org/10.3850/9783981537079_0032","url":null,"abstract":"High static power consumption and insufficient scalability of the commonly used DRAM main memory technology appear to be tough challenges in upcoming years. Hence, adopting new technologies, i.e. non-volatile memories (NVMs), is a proper choice. NVMs tolerate a low number of write operations while having good scalability and low static power consumption. Due to the non-destructive nature of a read operation and the long latency of a write operation in NVMs, designers use read-before-write (RBW) mechanism to mask the unchanged bits during write operation in order to reduce bit flips. Based on this observation that some specific locations of blocks are responsible for the majority of bit flips, we extend the RBW to further reduce the number of bit flips per write in the memory system. The results taken from full-system simulations reveal that our proposal, called Captopril, can reduce the number of bit flips by 21% and 9%, on average, compared to the baseline and state-of-the-art designs, respectively.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128435201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enrico Fraccaroli, M. Lora, S. Vinco, D. Quaglia, F. Fummi
{"title":"Integration of mixed-signal components into virtual platforms for holistic simulation of smart systems","authors":"Enrico Fraccaroli, M. Lora, S. Vinco, D. Quaglia, F. Fummi","doi":"10.3850/9783981537079_0376","DOIUrl":"https://doi.org/10.3850/9783981537079_0376","url":null,"abstract":"Nowadays, the design of applications based on smart systems requires the joint simulation of both digital and analog aspects. Even if analog-mixed-signal (AMS) extensions of hardware description languages are an enabling factor, they do not provide a general methodology for the integration of AMS models into digital virtual platforms. This paper defines the problem and provides two main contributions: 1) the automatic conversion of analog models from Verilog-AMS to C++/SystemC, to remove the overhead of co-simulation with traditional virtual platform tools, and 2) the automatic abstraction of analog conservative models, with the goal of increasing simulation speed. Experimental results show that the virtual platform with automatically integrated analog components is 40 times faster than co-simulation with Verilog-AMS, and the increase of speed due to abstraction is more than 100%.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127184161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting CPU-load and data correlations in multi-objective VM placement for geo-distributed data centers","authors":"A. Pahlevan, P. Valle, David Atienza Alonso","doi":"10.3850/9783981537079_0143","DOIUrl":"https://doi.org/10.3850/9783981537079_0143","url":null,"abstract":"Cloud computing has been proposed as a new paradigm to deliver services over the internet. The proliferation of cloud services and increasing users' demands for computing resources have led to the appearance of geo-distributed data centers (DCs). These DCs host heterogeneous applications with changing characteristics, like the CPU-load correlation, that provides significant potential for energy savings when the utilization peaks of two virtual machines (VMs) do not occur at the same time, or the amount of data exchanged between VMs, that directly impacts performance, i.e. response time. This paper presents a two-phase multi-objective VM placement, clustering and allocation algorithm, along with a dynamic migration technique, for geo-distributed DCs coupled with renewable and battery energy sources. It exploits the holistic knowledge of VMs characteristics, CPU-load and data correlations, to tackle the challenges of operational cost optimization and energyperformance trade-off. Experimental results demonstrate that the proposed method provides up to 55% operational cost savings, 15% energy consumption, and 12% performance (response time) improvements when compared to state-of-the-art schemes.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129968082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shanker Shreejith, Bezborah Anshuman, Suhaib A. Fahmy
{"title":"Accelerated Artificial Neural Networks on FPGA for fault detection in automotive systems","authors":"Shanker Shreejith, Bezborah Anshuman, Suhaib A. Fahmy","doi":"10.3850/9783981537079_0337","DOIUrl":"https://doi.org/10.3850/9783981537079_0337","url":null,"abstract":"Modern vehicles are complex distributed systems with critical real-time electronic controls that have progressively replaced their mechanical/hydraulic counterparts, for performance and cost benefits. The harsh and varying vehicular environment can induce multiple errors in the computational/ communication path, with temporary or permanent effects, thus demanding the use of fault-tolerant schemes. Constraints in location, weight, and cost prevent the use of physical redundancy for critical systems in many cases, such as within an internal combustion engine. Alternatively, algorithmic techniques like artificial neural networks (ANNs) can be used to detect errors and apply corrective measures in computation. Though adaptability of ANNs presents advantages for fault-detection and fault-tolerance measures for critical sensors, implementation on automotive grade processors may not serve required hard deadlines and accuracy simultaneously. In this work, we present an ANN-based fault-tolerance system based on hybrid FPGAs and evaluate it using a diesel engine case study. We show that the hybrid platform outperforms an optimised software implementation on an automotive grade ARM Cortex M4 processor in terms of latency and power consumption, also providing better consolidation.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131072676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy efficiency in cloud-based MapReduce applications through better performance estimation","authors":"Seyed Morteza Nabavinejad, M. Goudarzi","doi":"10.3850/9783981537079_0851","DOIUrl":"https://doi.org/10.3850/9783981537079_0851","url":null,"abstract":"An important issue for efficient execution of MapReduce jobs on a cloud platform is selecting the best fitting virtual machine (VM) configuration(s) among the miscellany of choices that cloud providers offer. Wise selection of VM configurations can lead to better performance, cost and energy consumption. Therefore, it is crucial to explore the available configurations and choose the best one for each given MapReduce application. Executing the given application on all the configurations for comparison is a costly, time and energy consuming process. An alternative is to run the application on a subset of configurations (sample configurations) and estimate its performance on other configurations based on the obtained values on those sample configurations. We show that the choice of these sample configurations highly affects accuracy of later estimations. Our Smart Configuration Selection (SCS) scheme chooses better representatives from among all configurations by once-off analysis of given performance figures of the benchmarks so as to increase the accuracy of estimations of missing values, and consequently, to more accurately choose the configuration providing the highest performance. The results show that the SCS choice of sample configurations is very close to the best choice, and can reduce estimation error to 7.11% from the original 16.02% of random configuration selection. Furthermore, this more accurate performance estimation saves 24.3% energy on average.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"2 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133007034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probabilistic Error Models for machine learning kernels implemented on stochastic nanoscale fabrics","authors":"Sai Zhang, Naresh R Shanbhag","doi":"10.3850/9783981537079_0407","DOIUrl":"https://doi.org/10.3850/9783981537079_0407","url":null,"abstract":"Presented in this paper are probabilistic error models for machine learning kernels implemented on low-SNR circuit fabrics where errors arise due to voltage overscaling (VOS), process variations, or defects. Four different variants of the additive error model are proposed that describe the error probability mass function (PMF): additive over Reals Error Model with independent Bernoulli RVs (REM-i), additive over Reals Error Model with joint Bernoulli random variables (RVs) (REM-j), additive over Galois field Error Model with independent Bernoulli RVs (GEM-i), and additive over Galois field Error Model with joint Bernoulli RVs (GEM-j). Analytical expressions for the error PMF is derived. Kernel level model validation is accomplished by comparing the Jensen-Shannon divergence DJS between the modeled PMF and the PMFs obtained via HDL simulations in a commercial 45nm CMOS process of MAC units used in a 2nd order polynomial support vector machine (SVM) to classify data from the UCI machine learning repository. Results indicate that at the MAC unit level, DJS for the GEM-j models are 1-to-2-orders-of-magnitude lower (better) than the REM models for VOS and process variation errors. However, when considering errors due to defects, DJS for REM-j is between 1-to-2-orders-of-magnitude lower than the others. Performance prediction of the SVM using these models indicate that when compared with Monte Carlo with HDL generated error statistics, probability of detection pdet estimated using GEM-j is within 3% for VOS error when the error rate pη ≤ 80%, and within 5% for process variation error when supply voltage Vdd is between 0.3V and 0.7V. In addition, pdet using REM-j is within 2% for defect errors when the defect rate (the percentage of circuit nets subject to stuck-at-faults) psaf is between 10-3 and 0.2.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132163725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trace-based analysis methodology of program flash contention in embedded multicore systems","authors":"Lin Li, A. Mayer","doi":"10.3850/9783981537079_0442","DOIUrl":"https://doi.org/10.3850/9783981537079_0442","url":null,"abstract":"Contention for shared resources is a major performance issue in multicore systems. In embedded multicore microcontrollers, contentions of program flash accesses have a significant performance impact, because the flash access has a large latency compared to a core clock cycle. Therefore, the detection and analysis of program flash contentions are necessary to remedy this situation. With a lack of existing tools being able to fulfill this task, a novel post-processing analysis methodology is proposed in this paper to acquire the information of program flash contentions in detail based on the non-intrusive trace. This information can be utilized to improve the overall performance and particularly to enhance the real-time performance of specific threads or functions for hard real-time multicore systems.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130916208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andres Gomez, L. Sigrist, M. Magno, L. Benini, L. Thiele
{"title":"Dynamic energy burst scaling for transiently powered systems","authors":"Andres Gomez, L. Sigrist, M. Magno, L. Benini, L. Thiele","doi":"10.3850/9783981537079_0403","DOIUrl":"https://doi.org/10.3850/9783981537079_0403","url":null,"abstract":"Energy harvesting is generally seen to be the key to power cyber-physical systems in a low-cost, long term, efficient manner. However, harvesting has traditionally been coupled with large energy storage devices to mitigate the effects of the source's variability. The emerging class of transiently powered systems avoids this issue by performing computation only as a function of the harvested energy, minimizing the obtrusive and expensive storage element. In this work, we present an efficient Energy Management Unit (EMU) to supply generic loads when the average harvested power is much smaller than required for sustained system operation. By building up charge to a pre-defined energy level, the EMU can generate short energy bursts predictably, even under variable harvesting conditions. Furthermore, we propose a dynamic energy burst scaling (DEBS) technique to adjust these bursts to the load's requirements. Using a simple interface, the load can dynamically configure the EMU to supply small bursts of energy at its optimal power point, independent from the harvester's operating point. Extensive theoretical and experimental data demonstrate the high energy efficiency of our approach, reaching up to 73.6% even when harvesting only 110 μW to supply a load of 3.89mW.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127879095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}