2012 International Conference on Embedded Computer Systems (SAMOS)最新文献

Energy efficient stream-based configurable architecture for embedded platforms 嵌入式平台的高能效流可配置架构

2012 International Conference on Embedded Computer Systems (SAMOS) Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404174

F. Pratas, P. Tomás, P. Trancoso, L. Sousa

{"title":"Energy efficient stream-based configurable architecture for embedded platforms","authors":"F. Pratas, P. Tomás, P. Trancoso, L. Sousa","doi":"10.1109/SAMOS.2012.6404174","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404174","url":null,"abstract":"Reconfigurable hardware can be used as an energy and performance efficient co-processing solution to accelerate certain types of applications. To facilitate the design of hardware accelerators we have proposed a methodology that adopts the stream-based computing model and the usage of Graphics Processing Units as prototyping platforms. In this paper we go a step further and propose a new modular architecture for low-power reconfigurable systems to easily map the stream-based algorithms. In particular, the architecture consists of a semi-programable accelerator set that can be adapted to the application needs in terms of functional units and number of streaming engines. The proposed embedded architecture mates the flexibility of reconfigurable hardware with the advantages of stream computing for the strict needs of embedded reconfigurable devices. We show a possible organization for this architecture. Moreover, we provide a general case study to analyze the scalability of the proposed architecture in an Altera FPGA. Our experimental results show that a significant speed-up can be achieved compared to general purpose processors using low-power FPGA devices. Our preliminary estimates show that it is also possible to achieve energy savings of up to 118x.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"384 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116522241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

System modeling and multicore simulation using transactions 使用事务的系统建模和多核仿真

2012 International Conference on Embedded Computer Systems (SAMOS) Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404156

Amine Anane, E. Aboulhamid, Y. Savaria

{"title":"System modeling and multicore simulation using transactions","authors":"Amine Anane, E. Aboulhamid, Y. Savaria","doi":"10.1109/SAMOS.2012.6404156","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404156","url":null,"abstract":"With the increasing complexity of digital systems that are becoming more and more parallel, a better abstraction to describe such systems has become a necessity. This paper shows how, by using the powerful mechanism of transactions as a concurrency model, and by taking advantage of .NET introspection and attribute programming capabilities, we were able to develop a system-level modeling and parallel simulation environment. We kept the same concepts to describe the architecture of high-level models, such as modules and communication channels. However, unlike SystemC, the behaviour is no longer described as processes and events but as transactions. We implemented scheduling algorithms in order to enable simulating a transactional models in parallel by taking advantage of a multicore machine. These algorithms take into account the dependency between transactions and the number of cores of the simulation machine. We studied two synchronisation strategies: one using locking and the other using partitioning. An experiment made on a WiFi 802.11a transmitter achieved a speedup of about 1.9 using two threads. With 8 threads, although the workload of individual transactions was not significant, we could reach a 5.1 speedup. When the workload is significant the speedup can reach 6.3.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122643576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Interleaving methods for hybrid system-level MPSoC design space exploration 混合系统级MPSoC设计空间探索的交错方法

2012 International Conference on Embedded Computer Systems (SAMOS) Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404152

R. Piscitelli, A. Pimentel

{"title":"Interleaving methods for hybrid system-level MPSoC design space exploration","authors":"R. Piscitelli, A. Pimentel","doi":"10.1109/SAMOS.2012.6404152","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404152","url":null,"abstract":"System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded system architectures. During system-level DSE, system parameters like, e.g., the number and type of processors, the type and size of memories, or the mapping of application tasks to architectural resources, are considered. Simulation-based DSE, in which different design instances are evaluated using system-level simulations, typically are computationally costly. Even using high-level simulations and efficient exploration algorithms, the simulation time to evaluate design points forms a real bottleneck in such DSE. Therefore, the vast design space that needs to be searched requires effective design space pruning techniques. This paper presents and studies different strategies for interleaving fast but less accurate analytical performance estimations with slower but more accurate simulations during DSE. By interleaving these analytical estimations with simulations, our hybrid approach significantly reduces the number of simulations that are needed during the process of DSE. Experimental results have demonstrated that such hybrid DSE is a promising technique that can yield solutions of similar quality as compared to simulation-based DSE but only at a fraction of the execution time.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122971346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Adaptive reinforcement learning method for networks-on-chip 片上网络的自适应强化学习方法

2012 International Conference on Embedded Computer Systems (SAMOS) Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404180

F. Farahnakian, M. Ebrahimi, M. Daneshtalab, J. Plosila, P. Liljeberg

引用次数: 28

BEE technology overview BEE技术概述

2012 International Conference on Embedded Computer Systems (SAMOS) Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404186

Joseph Rothman, Chen Chang

{"title":"BEE technology overview","authors":"Joseph Rothman, Chen Chang","doi":"10.1109/SAMOS.2012.6404186","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404186","url":null,"abstract":"This presentation will focus on a technology overview of the BEE4 and miniBEE FPGA based reconfigurable platforms. BEEcube supplies advanced system level FPGA prototyping platforms, targeting a wide range of uses including: multi-core computer architecture, wireless communications, 100Gbps+ networking solutions, HD video processing, signal intelligence, radar/sonar array, and High Performance Computing (HPC) needs. This overview will review features, capabilities, unique technology and uses of BEE platforms on both, its state of the art Virtex 6 based multi-array FPGA BEE4™ system, and introduce the first Research in a Box solution, the miniBEE™. miniBEE offers a combination of the latest FPGA, multicore CPU, high-speed networking technology all tightly coupled in one integrated cost effective solution targeting the research and lab community. This flexible system replaces the need for disjointed FPGA boards, PCs, networking devices, and test equipment. The presentation will describe how both algorithm oriented researchers as well as seasoned FPGA experts can utilize BEE technology to achieve their proof of concept or application level prototyping goals based on real time and real world data or conditions. Unique BEE technologies covered include its' symmetrical Honeycomb Architecture, Full Speed Sting I/O interface, Application Control and Debugging Nectar OS, and the BEEcube Platform Studio software environment. The presentation plans to include BEE technology in action, for real-time imaging manipulation or as a flexible testing platform, an Arbitrary Waveform Generation example.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129915223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Virtual prototyping for efficient multi-core ECU development of driver assistance systems 驾驶员辅助系统中高效多核ECU开发的虚拟样机

2012 International Conference on Embedded Computer Systems (SAMOS) Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404155

Rainer Kiesel, M. Streubühr, C. Haubelt, A. Terzis, J. Teich

{"title":"Virtual prototyping for efficient multi-core ECU development of driver assistance systems","authors":"Rainer Kiesel, M. Streubühr, C. Haubelt, A. Terzis, J. Teich","doi":"10.1109/SAMOS.2012.6404155","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404155","url":null,"abstract":"In recent years, road vehicles have experienced an enormous increase in driver assistance systems such as traffic sign recognition, lane departure warning, and pedestrian detection. Cost-efficient development of electronic control units (ECUs) for these systems is a complex challenge. The demand for shortened time to market makes the development even more challenging and thus demands efficient design flows. This paper proposes a model-based design flow that permits simulation-based performance evaluation of multi-core ECUs for driver assistance systems in a pre-development stage. The approach is based on a system-level virtual prototype of a multi-core ECU and allows the evaluation of timing effects by mapping application tasks to different platforms. The results show that performance estimation of different parallel implementation candidates is possible with high accuracy even in a pre-development stage. By adapting the best-fitting parallelization strategy to the final ECU, a reduction in the time to market period is possible. Currently, the design flow is being evaluated by Daimler AG and is being applied to a pedestrian detection system. Results from this application illustrate the benefits of the proposed approach.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115948045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Model-driven robot-software design using integrated models and co-simulation 基于集成模型和协同仿真的模型驱动机器人软件设计

2012 International Conference on Embedded Computer Systems (SAMOS) Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404197

J. Broenink, Yunyun Ni

引用次数: 22

Maximum performance computing for exascale applications 为百亿亿级应用程序提供最高性能计算

2012 International Conference on Embedded Computer Systems (SAMOS) Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404150

O. Mencer

{"title":"Maximum performance computing for exascale applications","authors":"O. Mencer","doi":"10.1109/SAMOS.2012.6404150","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404150","url":null,"abstract":"Summary form only given. Ever since Fermi, Pasta and Ulam conducted the first fundamentally important numerical experiments in 1953, science has been driven by the progress of available computational capability. In particular, computational quantum chemistry and computational quantum physics depend on ever increasing amounts of computation. However, due to power density limitations at the chip we have seen the end of single CPU performance scaling. Now the challenge is to improve compute performance through some form of parallel processing without incurring power limits at the system level. One way to deal with the system “power wall” question is to ask “what is the maximum amount of computation that can be achieved within a certain power budget”. We argue that such Maximum Performance Computing needs to focus on end-to-end execution time of complete scientific applications and needs to include a multi-disciplinary approach, bringing together scientists and engineers to optimize the whole process from mathematics and algorithms all the way down to arithmetic and number representation. We have done a number of such multidisciplinary studies with our customers (Chevron, Schlumberger, and JP Morgan). Our current results with Maxeler Dataflow Engines for production PDE solver applications in Earth Sciences and Finance show an improvement of 20-40x in Speed and/or Watts per application run.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"6 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132779564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Out-Of-order execution of synchronous data-flow networks 同步数据流网络的乱序执行

2012 International Conference on Embedded Computer Systems (SAMOS) Pub Date : 2012-07-16 DOI: 10.1109/SAMOS.2012.6404171

D. Baudisch, J. Brandt, K. Schneider

{"title":"Out-Of-order execution of synchronous data-flow networks","authors":"D. Baudisch, J. Brandt, K. Schneider","doi":"10.1109/SAMOS.2012.6404171","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404171","url":null,"abstract":"Data flow process networks (DPNs) have been introduced as a convenient model of computation for distributed and asynchronous systems since each process node can work independently of the other nodes, i. e. without the need of a global coordination. Synchronous and cyclo-static data flow process networks even allow to derive at compile-time efficient static schedules that allow one to run these systems with an efficient use of available resources, e. g. in embedded systems. Single process nodes of DPNs are stream-based computing devices that transform input streams to uniquely defined corresponding output streams such that single values of the output streams are computed as soon as sufficient input values are available. In this sense, they are related to the execution of an instruction stream by a conventional microprocessor. In this paper, we show how out-of-order execution that has been introduced for the efficient use of multiple functional units in microprocessors can also be used for the implementation of DPNs on multiprocessors. This way, the implementation of DPNs on multiprocessors allows one to optimize the throughput of single process nodes, and as shown by our experiments, also of the entire DPN.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117134555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

A template-based methodology for efficient microprocessor and FPGA accelerator co-design 基于模板的高效微处理器和FPGA加速器协同设计方法

2012 International Conference on Embedded Computer Systems (SAMOS) Pub Date : 2012-07-16 DOI: 10.1109/samos.2012.6404153

A. Kritikakou, F. Catthoor, G. Athanasiou, Vasilios I. Kelefouras, C. Goutis

引用次数: 2