Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems最新文献

筛选
英文 中文
Exploiting Predictability in Dynamic Network Communication for Power-Efficient Data Transmission in LTE Radio Systems 在LTE无线电系统中利用动态网络通信的可预测性实现高能效数据传输
Peter Brand, Jonathan Ah Sue, J. Brendel, J. Falk, R. Hasholzner, Jürgen Teich, S. Wildermann
{"title":"Exploiting Predictability in Dynamic Network Communication for Power-Efficient Data Transmission in LTE Radio Systems","authors":"Peter Brand, Jonathan Ah Sue, J. Brendel, J. Falk, R. Hasholzner, Jürgen Teich, S. Wildermann","doi":"10.1145/3078659.3078670","DOIUrl":"https://doi.org/10.1145/3078659.3078670","url":null,"abstract":"In embedded systems powered by batteries, power is undoubtedly a critical resource making power management an important topic in the design phase. Even though power management is a heavily researched topic, most approaches focus on improving the way the power manager reacts to outside control events. In this paper, we propose techniques that not only react but rather try to predict these outside control events in advance, thus, broadening the capabilities of any employed power manager by allowing for superior transition decisions and even saving redundant calculations. We present results on employing a predictive power management system that couples a classic dynamic power manager with a machine learning subsystem in the context of a mobile device in a Long Term Evolution (LTE) system, with emphasis on evaluating the potential of saving power as well as the handling of the induced prediction uncertainty. First, we examine the LTE communication protocol and showcase certain control data that has to be received periodically, but may contain no information for the receiver. Finally, we show a proof-of-concept based on real LTE traces and hardware simulation, that prediction of this information can be leveraged to allow for a far superior decision process compared to a non-predicting system. Here, we achieve a theoretical best case power saving of 15 % for an idealized prediction with 100 % accuracy and no additional power consumption.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126046736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings 俄罗斯方块:用于可预测执行静态映射的多应用运行时系统
Andrés Goens, R. Khasanov, J. Castrillón, Marcus Hähnel, Till Smejkal, Hermann Härtig
{"title":"TETRiS: a Multi-Application Run-Time System for Predictable Execution of Static Mappings","authors":"Andrés Goens, R. Khasanov, J. Castrillón, Marcus Hähnel, Till Smejkal, Hermann Härtig","doi":"10.1145/3078659.3078663","DOIUrl":"https://doi.org/10.1145/3078659.3078663","url":null,"abstract":"For embedded system software, it is common to use static mappings of tasks to cores. This becomes considerably more challenging in multi-application scenarios. In this paper, we propose TETRiS, a multi-application run-time system for static mappings for heterogeneous system-on-chip architectures. It leverages compile-time information to map and migrate tasks in a fashion that preserves the predictable performance of using static mappings, allowing the system to accommodate multiple applications. TETRiS runs on off-the-shelf embedded systems and is Linux-compatible. We embed our approach in a state-of-the-art compiler for multicore systems and evaluate the proposed run-time system in a modern heterogeneous platform using realistic benchmarks. We present two experiments whose execution time and energy consumptions are comparable to those obtained by the highly-optimized Linux scheduler CFS, and where execution time variance is reduced by a factor of 510, and energy consumption variance by a factor of 83.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124525979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Combining Dataflow Applications and Real-time Task Sets on Multi-core Platforms 在多核平台上结合数据流应用和实时任务集
H. Ali, B. Akesson, L. M. Pinho
{"title":"Combining Dataflow Applications and Real-time Task Sets on Multi-core Platforms","authors":"H. Ali, B. Akesson, L. M. Pinho","doi":"10.1145/3078659.3078671","DOIUrl":"https://doi.org/10.1145/3078659.3078671","url":null,"abstract":"Future real-time embedded systems will increasingly incorporate mixed application models with timing constraints running on the same multi-core platform. These application models are dataflow applications with timing constraints and traditional real-time applications modelled as independent arbitrary-deadline tasks. These systems require guarantees that all running applications execute satisfying their timing constraints. Also, to be cost-efficient in terms of design, they require efficient mapping strategies that maximize the use of system resources to reduce the overall cost. This work proposes an approach to integrate mixed application models (dataflow and traditional real-time applications) with timing requirements on the same multi-core platform. It comprises three main algorithms: 1) Slack-Based Merging, 2) Timing Parameter Extraction, and 3) Communication-Aware Mapping. Together, these three algorithms play a part in allowing mapping and scheduling of mixed application models in embedded real-time systems. The complete approach and the three algorithms presented have been validated through proofs and experimental evaluation.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134283656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enabling zero-copy OpenMP offloading on the PULP many-core accelerator 在PULP多核加速器上启用零拷贝OpenMP卸载
Alessandro Capotondi, A. Marongiu
{"title":"Enabling zero-copy OpenMP offloading on the PULP many-core accelerator","authors":"Alessandro Capotondi, A. Marongiu","doi":"10.1145/3078659.3079071","DOIUrl":"https://doi.org/10.1145/3078659.3079071","url":null,"abstract":"Many-core heterogeneous designs are nowadays widely available among embedded systems. Initiatives such as the HSA push for a model where the host processor and the accelerator(s) communicate via coherent, Unified Virtual Memory (UVM). In this paper we describe our experience in porting the OpenMP v4 programming model to a low-end, heterogeneous embedded system based on the PULP many-core accelerator featuring lightweight (software-managed) UVM support. We describe a GCC-based toolchain which enables: i) the automatic generation of host and accelerator binaries from a single, high-level, OpenMP parallel program; ii) the automatic instrumentation of the accelerator program to transparently manage UVM. This enables up to 4x faster execution compared to traditional copy-based offload mechanisms.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134511182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Stencil Autotuning with Ordinal Regression: Extended Abstract 基于有序回归的模板自调整:扩展摘要
Biagio Cosenza, J. Durillo, Stefano Ermon, B. Juurlink
{"title":"Stencil Autotuning with Ordinal Regression: Extended Abstract","authors":"Biagio Cosenza, J. Durillo, Stefano Ermon, B. Juurlink","doi":"10.1145/3078659.3078664","DOIUrl":"https://doi.org/10.1145/3078659.3078664","url":null,"abstract":"The increasing performance of today's computer architecture comes with an unprecedented augment of hardware complexity. Unfortunately this results in difficult-to-tune software and consequentially in a gap between the potential peak performance and the actual performance. Automatic tuning is an emerging approach that assists the programmer in managing this complexity. State-of-the-art autotuners are limited, though: they either require long tuning times, e.g., due to iterative searches, or cannot tackle the complexity of the problem due to the limitation of the supervised machine learning (ML) methodologies used. In particular, traditional ML autotuning approaches exploiting classification algorithms (such as neural networks and support vector machines) face difficulties in capturing all features of large search spaces. We propose a new way of performing automatic tuning based on structural learning: the tuning problem is formulated as a version ranking prediction modeling and solved using ordinal regression. We demonstrate its potential on a well-known autotuning problem: stencil computations. We compare state-of-the-art iterative compilation methods with our ordinal regression approach and analyze the quality of the obtained ranking in terms of Kendall rank correlation coefficients.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133150349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Numerical Accuracy Improvement by Interprocedural Program Transformation 程序间程序转换提高数值精度
Nasrine Damouche, M. Martel, Alexandre Chapoutot
{"title":"Numerical Accuracy Improvement by Interprocedural Program Transformation","authors":"Nasrine Damouche, M. Martel, Alexandre Chapoutot","doi":"10.1145/3078659.3078662","DOIUrl":"https://doi.org/10.1145/3078659.3078662","url":null,"abstract":"Floating-point numbers are used to approximate the exact real numbers in a wide range of domains like numerical simulations, embedded software, etc. However, floating-point numbers are a finite approximation of real numbers. In practice, this approximation may introduce round-off errors and this can lead to catastrophic results. To cope with this issue, we have developed a tool which corrects partly these round-off errors and which consequently improves the numerical accuracy of computations by automatically transforming programs in a source to source manner. Our transformation, relies on static analysis by abstract interpretation and operates on pieces of code with assignments, conditionals and loops. In former work, we have focused on the intraprocedural transformation of programs and, in this article, we introduce the interprocedural transformation to improve accuracy.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122077642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics 基于近似算法的自适应fpga图像处理滤波器
Jutta Pirkl, Andreas Becher, Jorge Echavarria, J. Teich, S. Wildermann
{"title":"Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics","authors":"Jutta Pirkl, Andreas Becher, Jorge Echavarria, J. Teich, S. Wildermann","doi":"10.1145/3078659.3078669","DOIUrl":"https://doi.org/10.1145/3078659.3078669","url":null,"abstract":"Approximate Computing aims at trading off computational accuracy against improvements regarding performance, resource utilization and power consumption by making use of the capability of many applications to tolerate a certain loss of quality. A key issue is the dependency of the impact of approximation on the input data as well as user preferences and environmental conditions. In this context, we therefore investigate the concept of self-adaptive image processing that is able to autonomously adapt 2D-convolution filter operators of different accuracy degrees by means of partial reconfiguration on Field-Programmable-Gate-Arrays (FPGAs). Experimental evaluation shows that the dynamic system is able to better exploit a given error tolerance than any static approximation technique due to its responsiveness to changes in input data. Additionally, it provides a user control knob to select the desired output quality via the metric threshold at runtime.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"48 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132026471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust Mapping of Process Networks to Many-Core Systems using Bio-Inspired Design Centering 基于仿生设计中心的过程网络多核系统鲁棒映射
G. Hempel, Andrés Goens, J. Castrillón, Josefine Asmus, I. Sbalzarini
{"title":"Robust Mapping of Process Networks to Many-Core Systems using Bio-Inspired Design Centering","authors":"G. Hempel, Andrés Goens, J. Castrillón, Josefine Asmus, I. Sbalzarini","doi":"10.1145/3078659.3078667","DOIUrl":"https://doi.org/10.1145/3078659.3078667","url":null,"abstract":"Embedded systems are often designed as complex architectures with numerous processing elements. Effectively programming such systems requires parallel programming models e.g. task-based or dataflow-based models. With these types of models, the mapping of the abstract application model to the existing hardware architecture plays a decisive role and is usually optimized to achieve an ideal resource footprint or a near-minimal execution time. However, when mapping several independent programs to the same platform, resource conflicts can arise. This can be circumvented by remapping some of the tasks of an application, which in turn affect its timing behavior, possibly leading to constraint violations. In this work we present a novel method to compute mappings that are robust against local task remapping. The underlying method is based on the bio-inspired design centering algorithm of Lp-Adaptation. We evaluate this with several benchmarks on different platforms and show that mappings obtained with our algorithm are indeed robust. In all experiments, our robust mappings tolerated significantly more run-time perturbations without violating constraints than mappings devised with optimization heuristics","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134019251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the Accuracy of Near-Optimal GPU-Based Path Planning for UAVs 基于gpu的无人机近最优路径规划精度研究
D. Palossi, A. Marongiu, L. Benini
{"title":"On the Accuracy of Near-Optimal GPU-Based Path Planning for UAVs","authors":"D. Palossi, A. Marongiu, L. Benini","doi":"10.1145/3078659.3079072","DOIUrl":"https://doi.org/10.1145/3078659.3079072","url":null,"abstract":"Path planning is one of the key functional blocks for any autonomous aerial vehicle (UAV). The goal of a path planner module is to constantly update the route of the vehicle based on information sensed in real-time. Given the high computational requirements of this task, heterogeneous many-cores are appealing candidates for its execution. Approximate path computation has proven a promising approach to reduce total execution time, at the cost of a slight loss in accuracy. In this work we study performance and accuracy of state-of-the-art, near-optimal parallel path planning in combination with program transformations aimed at ensuring efficient use of embedded GPU resources. We propose a profile-based algorithmic variant which boosts GPU execution by up to ≈ 7x, while maintaining the accuracy loss below 5%.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"355 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122763879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Conversion of Simulink Models to SysteMoC Actor Networks Simulink模型到systememoc参与者网络的自动转换
Martín Letras, J. Falk, S. Wildermann, J. Teich
{"title":"Automatic Conversion of Simulink Models to SysteMoC Actor Networks","authors":"Martín Letras, J. Falk, S. Wildermann, J. Teich","doi":"10.1145/3078659.3078668","DOIUrl":"https://doi.org/10.1145/3078659.3078668","url":null,"abstract":"Simulink has gained a lot of acceptance due to its intuitive through block-based algorithm design, simulation, and rapid prototyping capabilities for signal processing as well as control applications. However, automatic code generation for heterogeneous architectures is currently not supported by Simulink. In the literature, there exist automatic translation toolchains for generation of C or C++ code from Simulink models, which then are used for implementation or validation purposes. But few of them approach the generation of models that can be used in well-established Electronic System Level (ESL) design methodologies and tools. In order to address this issue, we present a methodology to extract an executable specification based on Data Flow Graphs (DFGs) from a given Simulink model. Such a specification can then be used by ESL tools to perform a Design Space Exploration (DSE) and generate code for hardware/software partitions directly from the ESL model. In a case study from signal processing, we validate the equivalence of the results of the simulation in Simulink and the results obtained by simulation of the DFG fully automatically generated from the Simulink model in the SystemC-based actor language SysteMoC.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124428094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信