Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems最新文献_第2页

Plasmon-based Virus Detection on Heterogeneous Embedded Systems 异构嵌入式系统中基于等离子体的病毒检测

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2015-06-01 DOI: 10.1145/2764967.2764976

Olaf Neugebauer, Pascal Libuschewski, M. Engel, H. Müller, P. Marwedel

{"title":"Plasmon-based Virus Detection on Heterogeneous Embedded Systems","authors":"Olaf Neugebauer, Pascal Libuschewski, M. Engel, H. Müller, P. Marwedel","doi":"10.1145/2764967.2764976","DOIUrl":"https://doi.org/10.1145/2764967.2764976","url":null,"abstract":"Embedded systems, e.g. in computer vision applications, are expected to provide significant amounts of computing power to process large data volumes. Many of these systems, such as used in medical diagnosis, are mobile devices and face significant challenges to provide sufficient performance while operating on a constrained energy budget. Modern embedded MPSoC platforms use heterogeneous CPU and GPU cores providing a large number of optimization parameters. This allows to find useful trade-offs between energy consumption and performance for a given application. In this paper, we describe how the complex data processing required for PAMONO, a novel type of biosensor for the detection of biological viruses, can efficiently be implemented on a state-of-the-art heterogeneous MPSoC platform. An additional optimization dimension explored is the achieved quality of service. Reducing the virus detection accuracy enables additional optimizations not achievable by modifying hardware or software parameters alone. Instead of relying on often inaccurate simulation models, our design space exploration employs a hardware-in-the-loop approach to evaluate the performance and energy consumption on the embedded target platform. Trade-offs between performance, energy and accuracy are controlled by a genetic algorithm running on a PC control system which deploys the evaluation tasks to a number of connected embedded boards. Using our optimization approach, we are able to achieve frame rates meeting the requirements without losing accuracy. Further, our approach is able to reduce the energy consumption by 93% with a still reasonable detection quality.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115272435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Adaptive Isolation for Predictable MPSoC Stream Processing 自适应隔离可预测的MPSoC流处理

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2015-06-01 DOI: 10.1145/2764967.2771821

J. Teich

{"title":"Adaptive Isolation for Predictable MPSoC Stream Processing","authors":"J. Teich","doi":"10.1145/2764967.2771821","DOIUrl":"https://doi.org/10.1145/2764967.2771821","url":null,"abstract":"Resource sharing and interferences of multiple threads of one, but even worse between multiple application programs running concurrently on a Multi-Processor System-on-a-Chip (MPSoC) today make it very hard to provide any timing or throughput-critical applications with time bounds. Additional interferences result from the interaction of OS functions such as thread multiplexing and scheduling as well as complex resource (e.g., cache) reservation protocols used heavily today. Finally, dynamic power and temperature management on a chip might also throttle down processor speed at arbitrary times leading to additional variations and jitter in execution time. This may be intolerable for many safety-critical applications such as medical imaging or automotive driver assistance systems. Static solutions to provide the required isolation by allocating distinct resources to safety-critical applications may not be feasible for reasons of cost and due to the lack of efficiency and inflexibility. Also, shutting off or restricting temperature and power management might not be tolerable. In this keynote, we propose new techniques for adaptive isolation of resources including processor, I/O, memory as well as communication resources on demand on an MPSoC based on the paradigm of Invasive Computing. In Invasive Computing, a programmer may specify bounds on the execution quality of a program or even single segments of a program followed by an invade command. This system returns a constellation of exclusive resources called a claim that is subsequently used in a by-default non-shared way until being released again by the invader. Through this principle, it becomes possible to isolate applications automatically and in an on-demand manner. In invasive computing, isolation is supported on all levels of hardware and software including an invasive OS. In case of an abundant number of cores available on an MPSoC today, the problem still becomes how to find suitable claims that will guarantee a performance bound in a negligible amount of time? For a broad class of streaming applications, we propose a combined static/dynamic approach based on a static design space exploration phase to extract a set of satisfying claim characteristics for which program execution is guaranteed to stay within the desired performance bounds. For a class of compositional and heterogeneous MPSoC systems, only very little information must then be passed to the OS for run-time claim search in the form of so-called CCGs (claim constraint graphs). A special role here plays a compositional Network-on-a-Chip (NoC) architecture that allows to invade guaranteed bandwith between processor, memory and I/O tiles independently from other applications. We demonstrate the above concepts for a complex object detection application algorithm chain taken from robot vision to show jitter-minimized implementations become possible, even for statically unknown arrivals of other concurrent applications.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"253 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116391555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Compilation of Stream Programs for Heterogeneous Architectures: A Model-Checking based approach 异构架构流程序的高效编译:一种基于模型检查的方法

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2015-06-01 DOI: 10.1145/2764967.2764968

R. K. Thakur, Y. Srikant

{"title":"Efficient Compilation of Stream Programs for Heterogeneous Architectures: A Model-Checking based approach","authors":"R. K. Thakur, Y. Srikant","doi":"10.1145/2764967.2764968","DOIUrl":"https://doi.org/10.1145/2764967.2764968","url":null,"abstract":"Stream programming based on the synchronous data flow (SDF) model naturally exposes data, task and pipeline parallelism. Statically scheduling stream programs for homogeneous architectures has been an area of extensive research. With graphic processing units (GPUs) now emerging as general purpose co-processors, scheduling and distribution of these stream programs onto heterogeneous architectures (having both GPUs and CPUs) provides for challenging research. Exploiting this abundant parallelism in hardware, and providing a scalable solution is a hard problem. In this paper we describe a coarse-grained software pipelined scheduling algorithm for stream programs which statically schedules a stream graph onto heterogeneous architectures. We formulate the problem of partitioning the work between the CPU cores and the GPU as a model-checking problem. The partitioning process takes into account the costs of the required buffer layout transformations associated with the partitioning and the distribution of the stream graph. The solution trace result from the model checking provides a map for the distribution of actors across different processors/-cores. This solution is then divided into stages, and then a coarse grained software-pipelined code is generated. We use CUDA streams to map these programs synergistically onto the CPU and GPUs. We use a performance model for data transfers to determine the optimal number of CUDA streams on GPUs. Our software-pipelined schedule yields a speedup of upto 55.86X and a geometric mean speedup of 9.62X over a single threaded CPU.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125479194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Is dynamic compilation possible for embedded systems? 动态编译对嵌入式系统是可能的吗?

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2015-06-01 DOI: 10.1145/2764967.2782785

H. Charles, V. Lomüller

{"title":"Is dynamic compilation possible for embedded systems?","authors":"H. Charles, V. Lomüller","doi":"10.1145/2764967.2782785","DOIUrl":"https://doi.org/10.1145/2764967.2782785","url":null,"abstract":"JIT compilation and dynamic compilation are powerful techniques allowing to delay the final code generation to the runtime. There is many benefits: improved portability, virtual machine security, etc. Unforturnately the tools used for JIT compilation and dynamic compilation does not met the classical requirement for embedded platforms: memory size is huge and code generation has big overheads. In this paper we show how dynamic code specialization (JIT) can be used and be beneficial in terms of execution speed and energy consumption with memory footprint kept under control. We based our approaches on our tool deGoal and on LLVM, that we extended to be able to produce lightweight runtime specializers from annotated LLVM programs. Benchmarks are manipulated and transformed into templates and a specialization routine is build to instantiate the routines. Such approach allows to produce efficient specializations routines, with a minimal energy consumption and memory footprint compare to a generic JIT application. Through some benchmarks, we present its efficiency in terms of speed, energy and memory footprint. We show that over static compilation we can achieve a speed-up of 21 % in terms of execution speed but also a 10 % energy reduction with a moderate memory footprint.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114081048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

An Energy Efficient Message Passing Synchronization Algorithm for Concurrent Data Structures in Embedded Systems 嵌入式系统并发数据结构的高能效消息传递同步算法

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2015-06-01 DOI: 10.1145/2764967.2771931

Lazaros Papadopoulos, D. Soudris

引用次数: 1

Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling 基于准静态调度的多核数据流应用的吞吐量优化编译

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2015-06-01 DOI: 10.1145/2764967.2764972

T. Schwarzer, J. Falk, M. Glaß, J. Teich, C. Zebelein, C. Haubelt

{"title":"Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling","authors":"T. Schwarzer, J. Falk, M. Glaß, J. Teich, C. Zebelein, C. Haubelt","doi":"10.1145/2764967.2764972","DOIUrl":"https://doi.org/10.1145/2764967.2764972","url":null,"abstract":"Application modeling using dynamic dataflow graphs is well-suited for multi-core platforms. However, there is often a mismatch between the fine granularity of the application and the platform. Tailoring this granularity to the platform promises performance gains by (a) reducing dynamic scheduling overhead and (b) exploiting compiler optimizations. In this paper, we propose a throughput-optimizing compilation approach that uses Quasi-Static Schedules (QSSs) to combine actors of static dataflow subgraphs. Our proposed approach combines core allocation, QSSs, and actor binding in a Design Space Exploration (DSE), optimizing the throughput for a number of available cores. During the DSE, each implementation candidate is compiled to and evaluated on the target hardware---here an Intel i7 and an ARM Cortex-A9. Experimental results including synthetic benchmarks as well as a real-world control application show that our proposed holistic compilation approach outperforms classic DSEs that are agnostic of QSS as well as a DSE that employs QSS as a post-processing step. Amongst others, we show a case where the compilation approach obtains a speedup of 9.91 x for a 4-core implementation, while a classic DSE only obtains a speedup of 2.12 x.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126153584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Application-Specific Architecture Exploration Based on Processor-Agnostic Performance Estimation 基于处理器不可知性能估计的特定应用架构探索

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2015-06-01 DOI: 10.1145/2764967.2771932

Juan Fernando Eusse Giraldo, L. Murillo, C. McGirr, R. Leupers, G. Ascheid

{"title":"Application-Specific Architecture Exploration Based on Processor-Agnostic Performance Estimation","authors":"Juan Fernando Eusse Giraldo, L. Murillo, C. McGirr, R. Leupers, G. Ascheid","doi":"10.1145/2764967.2771932","DOIUrl":"https://doi.org/10.1145/2764967.2771932","url":null,"abstract":"Early design decisions such as architectural class and instruction set selection largely determine the performance and energy consumption of application specific processors (ASIPs). However, making decisions that effectively reflect in high performance require that a careful analysis of the target application is done by an experienced designer. Such process is extremely time consuming, and a confirmation that the processor meets the application requirements can only be extracted after costly architectural implementation, synthesis and simulation. To shorten design times, this work couples High-Level Synthesis (HLS) with pre-architectural performance estimation. We do so with the aim of providing designers with an initial architectural seed together with quantitative feedback about its performance. This enables to perform a light-weight refinement process based on the obtained feedback, such that time-consuming microarchitectural implementation is done only once at the end of the refinement steps. We employed our flow to generate four potential ASIPs for a 1024-point FFT. Estimates validation and gain evaluation is performed on actual ASIP implementations, which achieve performance gains of up to 8.42x and energy gains up to 1.32x over an existing VLIW processor.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128645407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays 大规模并行处理器阵列中热和功耗约束下应用程序执行的运行时适应性

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2015-06-01 DOI: 10.1145/2764967.2771933

É. Sousa, Frank Hannig, J. Teich, Qingqing Chen, Ulf Schlichtmann

{"title":"Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays","authors":"É. Sousa, Frank Hannig, J. Teich, Qingqing Chen, Ulf Schlichtmann","doi":"10.1145/2764967.2771933","DOIUrl":"https://doi.org/10.1145/2764967.2771933","url":null,"abstract":"Massively Parallel Processor Arrays (MPPAs) can be nicely used in portable devices such as tablets and smartphones. However, applications running on mobile platforms require a certain performance level or quality (e.g., high-resolution image processing) that need to be satisfied while adhering to a certain power budget and temperature threshold. As a solution to the aforementioned challenges, we consider a resource-aware computing paradigm to exploit runtime adaptation without violating any thermal and/or power constraint in a programmable MPPA. For estimating the power consumption, we developed a mathematical model based on the post-synthesis implementation of an MPPA in different CMOS technologies while the temperature variation was emulated. We showcase our hardware/software mechanism to load new, on-the-fly configurations into the accelerator, considering quality/throughput tradeoffs for image processing applications. The results show that the average power consumption of a Sobel and Laplace operators using different number of processing elements amounts to 1.24 mW and 10.35 mW, respectively. Furthermore, only 1.64 μs are necessary for configuring a class of MPPA running at 550 MHz.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132258275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

VLIW Code Generation for a Convolutional Network Accelerator 卷积网络加速器VLIW代码生成

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2015-06-01 DOI: 10.1145/2764967.2771928

Maurice Peemen, W. Pramadi, B. Mesman, H. Corporaal

引用次数: 4

A model-based, single-source approach to design-space exploration and synthesis of mixed-criticality systems 一种基于模型的单一来源方法，用于混合临界系统的设计空间探索和综合

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2015-06-01 DOI: 10.1145/2764967.2784777

F. Herrera, P. Peñil, E. Villar

引用次数: 8