Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems最新文献

筛选
英文 中文
MASES: Mobility And Slack Enhanced Scheduling For Latency-Optimized Pipelined Dataflow Graphs MASES:移动性和Slack增强调度延迟优化的流水线数据流图
Wenxiao Yu, Jacob Kornerup, A. Gerstlauer
{"title":"MASES: Mobility And Slack Enhanced Scheduling For Latency-Optimized Pipelined Dataflow Graphs","authors":"Wenxiao Yu, Jacob Kornerup, A. Gerstlauer","doi":"10.1145/3207719.3207733","DOIUrl":"https://doi.org/10.1145/3207719.3207733","url":null,"abstract":"Dataflow and task graph descriptions are widely used for mapping and scheduling of real-time streaming applications onto heterogeneous processing platforms. Such applications are often characterized by the need to process large-volume data streams with not only high throughput, but also low latency. Mapping such application descriptions into tightly constrained implementations requires optimization of pipelined scheduling of tasks on different processing elements. This poses the problem of finding an optimal solution across a latency-throughput objective space. In this paper, we present a novel list-scheduling based heuristic called MASES for pipelined dataflow scheduling to minimize latency under throughput and heterogeneous resource constraints. MASES explores the flexibility provided by mobility and slack of actors in a partial schedule. It can find a valid schedule if one exists even under tight throughput and resource constraints. Furthermore, MASES can improve runtime by up to 4x while achieving similar results as other latency-oriented heuristics for problems they can solve.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120856777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Toward Efficient Many-core Scheduling of Partial Expansion Graphs 部分展开图的高效多核调度
Hai Nam Tran, S. Bhattacharyya, J. Talpin, T. Gautier
{"title":"Toward Efficient Many-core Scheduling of Partial Expansion Graphs","authors":"Hai Nam Tran, S. Bhattacharyya, J. Talpin, T. Gautier","doi":"10.1145/3207719.3207734","DOIUrl":"https://doi.org/10.1145/3207719.3207734","url":null,"abstract":"Transformation of synchronous data flow graphs (SDF) into equivalent homogeneous SDF representations has been extensively applied as a pre-processing stage when mapping signal processing algorithms onto parallel platforms. While this transformation helps fully expose task and data parallelism, it also presents several limitations such as an exponential increase in the number of actors and excessive communication overhead. Partial expansion graphs were introduced to address these limitations for multi-core platforms. However, existing solutions are not well-suited to achieve efficient scheduling on many-core architectures. In this article, we develop a new approach that employs cyclo-static data flow techniques to provide a simple but efficient method of coordinating the data production and consumption in the expanded graphs. We demonstrate the advantage of our approach through experiments on real application models.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126889723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Worst-Case Execution Times Using Mainstream Compilers 使用主流编译器优化最坏情况下的执行时间
M. Becker, S. Chakraborty
{"title":"Optimizing Worst-Case Execution Times Using Mainstream Compilers","authors":"M. Becker, S. Chakraborty","doi":"10.1145/3207719.3207739","DOIUrl":"https://doi.org/10.1145/3207719.3207739","url":null,"abstract":"Compiler optimizations are widely used to enhance the average case performance of software, and these techniques are very effective and advance with every compiler version. However, in realtime systems, it is the worst-case performance that matters. While there are techniques that aim at reducing the worst-case execution time (WCET), most of them are specific to certain targets and not implemented in mainstream compilers. In this paper, we present our ongoing work for a generic approach to harness the power of existing compiler optimizations for WCET reduction. Our approach is based on an existing compiler technology called Feedback-Directed Optimization (FDO), which can reduce the execution time of a program by making use of profiling data, and recently became popular due to major improvements. We first introduce a static analysis to efficiently compute a worst-case timing profile based on control flow dominators. During this analysis we perform a minimal number of automated calls to a WCET analyzer. The resulting profile contains basic block and branch execution counts, which then can be used in the regular FDO workflow. Preliminary results show that significant WCET reductions are possible, but depend on many factors that need more investigation.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131625965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Control Flow Vectorization for ARM NEON ARM NEON的控制流矢量化
Angela Pohl, Biagio Cosenza, B. Juurlink
{"title":"Control Flow Vectorization for ARM NEON","authors":"Angela Pohl, Biagio Cosenza, B. Juurlink","doi":"10.1145/3207719.3207721","DOIUrl":"https://doi.org/10.1145/3207719.3207721","url":null,"abstract":"Single Instruction Multiple Data (SIMD) extensions in processors enable in-core parallelism for operations on vectors of data. From the compiler perspective, SIMD instructions require automatic techniques to determine how and when it is possible to express computations in terms of vector operations. When this is not possible automatically, a user may still write code in a manner that allows the compiler to deduce that vectorization is possible, or by explicitly define how to vectorize by using intrinsics. This work analyzes the challenge of generating efficient vector instructions by benchmarking 151 loop patterns with three compilers on two SIMD instruction sets. Comparing the vectorization rates for the AVX2 and NEON instruction sets, we observed that the presence of control flow poses a major problem for the vectorization on NEON. We consequently propose a set of solutions to generate efficient vector instructions in the presence of control flow. In particular, we show how to overcome the lack of masked load and store instruction with different code generation strategies. Results show that we enable vectorization of conditional read operations with a minimal overhead, while our technique of atomic select stores achieves a speedup of more than 2x over state of the art for large vectorization factors.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133006708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Exploiting Specification Modularity to Prune the Optimization-Space of Manufacturing Systems 利用规格模块化修剪制造系统的优化空间
J. Bastos, S. Stuijk, J. Voeten, R. Schiffelers, H. Corporaal
{"title":"Exploiting Specification Modularity to Prune the Optimization-Space of Manufacturing Systems","authors":"J. Bastos, S. Stuijk, J. Voeten, R. Schiffelers, H. Corporaal","doi":"10.1145/3207719.3207728","DOIUrl":"https://doi.org/10.1145/3207719.3207728","url":null,"abstract":"In this paper we address the makespan optimization of industrial-sized manufacturing systems. We introduce a framework which specifies functional system requirements in a compositional way and automatically computes makespan optimal solutions respecting these requirements. We show the optimization problem to be NP-Hard. To scale towards systems of industrial complexity, we propose a novel approach based on a subclass of compositional requirements which we call constraints. We prove that these constraints always prune the worst-case optimization-space thereby increasing the odds of finding an optimal solution (with respect to the additional constraints). We demonstrate the applicability of the framework on an industrial-sized manufacturing system.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114993592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Measuring and Modeling Energy Consumption of Embedded Systems for Optimizing Compilers 基于优化编译器的嵌入式系统能耗测量与建模
Mikko Roth, Arno Luppold, H. Falk
{"title":"Measuring and Modeling Energy Consumption of Embedded Systems for Optimizing Compilers","authors":"Mikko Roth, Arno Luppold, H. Falk","doi":"10.1145/3207719.3207729","DOIUrl":"https://doi.org/10.1145/3207719.3207729","url":null,"abstract":"Estimating energy consumption already during development as precisely as possible is crucial for many embedded system designs. These energy estimates should be expressed such that they can be used by subsequent automated optimizations during the compilation phase in order to minimize the expected energy consumption. In this paper we present our current approach on measuring and modeling, and subsequently using the derived energy estimates. Our model is implemented within an optimizing compiler, allowing for future energy focused compiler optimizations.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129132128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Restricted Scheduling Windows for Dynamic Fault-Tolerant Primary/Backup Approach-Based Scheduling on Embedded Systems 基于嵌入式系统动态容错主/备份调度方法的受限调度窗口
Petr Dobiáš, E. Casseau, O. Sinnen
{"title":"Restricted Scheduling Windows for Dynamic Fault-Tolerant Primary/Backup Approach-Based Scheduling on Embedded Systems","authors":"Petr Dobiáš, E. Casseau, O. Sinnen","doi":"10.1145/3207719.3207724","DOIUrl":"https://doi.org/10.1145/3207719.3207724","url":null,"abstract":"This paper is aimed at studying fault-tolerant design of the realtime multi-processor systems and is in particular concerned with the dynamic mapping and scheduling of tasks on embedded systems. The effort is concentrated on scheduling strategy having reduced complexity and guaranteeing that, when a task is input into the system and accepted, then it is correctly executed prior to the task deadline. The chosen method makes use of the primary/backup approach and this paper describes its refinement based on reduction of windows within which the primary and the backup copies can be scheduled. The results show that the use of restricted scheduling windows reduces the algorithm complexity by up to 15%.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116204344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On the Cost of Freedom From Interference in Heterogeneous SoCs 异构soc中免于干扰的代价
Björn Forsberg, L. Benini, A. Marongiu
{"title":"On the Cost of Freedom From Interference in Heterogeneous SoCs","authors":"Björn Forsberg, L. Benini, A. Marongiu","doi":"10.1145/3207719.3207735","DOIUrl":"https://doi.org/10.1145/3207719.3207735","url":null,"abstract":"In heterogeneous CPU+GPU SoCs where a single DRAM is shared between both devices, concurrent memory accesses from both devices can lead to slowdowns due to memory interference. This prevents the deployment of real-time tasks, which need to be guaranteed to complete before a set deadline. However, freedom from interference can be guaranteed through software memory scheduling, but may come at a significant cost due to frequent CPU-GPU synchronizations. In this paper we provide a compile-time model to help developers make informed decisions on how to achieve freedom from interference at the lowest cost.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125243293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Kernel Fusion for Image Processing DSLs 图像处理dsl的自动核融合
Bo Qiao, Oliver Reiche, Frank Hannig, J. Teich
{"title":"Automatic Kernel Fusion for Image Processing DSLs","authors":"Bo Qiao, Oliver Reiche, Frank Hannig, J. Teich","doi":"10.1145/3207719.3207723","DOIUrl":"https://doi.org/10.1145/3207719.3207723","url":null,"abstract":"Programming image processing algorithms on hardware accelerators such as graphics processing units (GPUs) often exhibits a trade-off between software portability and performance portability. Domain-specific languages (DSLs) have proven to be a promising remedy, which enable optimizations and generation of efficient code from a concise, high-level algorithm representation. The scope of this paper is an optimization framework for image processing DSLs in the form of a source-to-source compiler. To cope with the inter-kernel communication bound via global memory for GPU applications, kernel fusion is investigated as a primary optimization technique to improve temporal locality. In order to enable automatic kernel fusion, we analyze the fusibility of each kernel in the algorithm, in terms of data dependencies, resource utilization, and parallelism granularity. By combining the obtained information with the domain-specific knowledge captured in the DSL, a method to automatically fuse the suitable kernels is proposed and integrated into an open source DSL framework. The novel kernel fusion technique is evaluated on two filter-based image processing applications, for which speedups of up to 1.60 are obtained for an NVIDIA Geforce 745 graphics card target.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115949366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Towards a verified Lustre compiler with modular reset 朝着一个经过验证的具有模块化重置的Lustre编译器
T. Bourke, Lélio Brun, Marc Pouzet
{"title":"Towards a verified Lustre compiler with modular reset","authors":"T. Bourke, Lélio Brun, Marc Pouzet","doi":"10.1145/3207719.3207732","DOIUrl":"https://doi.org/10.1145/3207719.3207732","url":null,"abstract":"This paper presents ongoing work to add a modular reset construct to a verified Lustre compiler. We present a novel formal specification for the construct and sketch our plans to integrate it into the compiler and its correctness proof.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128467812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信