Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems最新文献

MASES: Mobility And Slack Enhanced Scheduling For Latency-Optimized Pipelined Dataflow Graphs MASES:移动性和Slack增强调度延迟优化的流水线数据流图

Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems Pub Date : 2018-05-28 DOI: 10.1145/3207719.3207733

Wenxiao Yu, Jacob Kornerup, A. Gerstlauer

{"title":"MASES: Mobility And Slack Enhanced Scheduling For Latency-Optimized Pipelined Dataflow Graphs","authors":"Wenxiao Yu, Jacob Kornerup, A. Gerstlauer","doi":"10.1145/3207719.3207733","DOIUrl":"https://doi.org/10.1145/3207719.3207733","url":null,"abstract":"Dataflow and task graph descriptions are widely used for mapping and scheduling of real-time streaming applications onto heterogeneous processing platforms. Such applications are often characterized by the need to process large-volume data streams with not only high throughput, but also low latency. Mapping such application descriptions into tightly constrained implementations requires optimization of pipelined scheduling of tasks on different processing elements. This poses the problem of finding an optimal solution across a latency-throughput objective space. In this paper, we present a novel list-scheduling based heuristic called MASES for pipelined dataflow scheduling to minimize latency under throughput and heterogeneous resource constraints. MASES explores the flexibility provided by mobility and slack of actors in a partial schedule. It can find a valid schedule if one exists even under tight throughput and resource constraints. Furthermore, MASES can improve runtime by up to 4x while achieving similar results as other latency-oriented heuristics for problems they can solve.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120856777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Toward Efficient Many-core Scheduling of Partial Expansion Graphs 部分展开图的高效多核调度

Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems Pub Date : 2018-05-28 DOI: 10.1145/3207719.3207734

Hai Nam Tran, S. Bhattacharyya, J. Talpin, T. Gautier

引用次数: 0

Optimizing Worst-Case Execution Times Using Mainstream Compilers 使用主流编译器优化最坏情况下的执行时间

Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems Pub Date : 2018-05-28 DOI: 10.1145/3207719.3207739

M. Becker, S. Chakraborty

{"title":"Optimizing Worst-Case Execution Times Using Mainstream Compilers","authors":"M. Becker, S. Chakraborty","doi":"10.1145/3207719.3207739","DOIUrl":"https://doi.org/10.1145/3207719.3207739","url":null,"abstract":"Compiler optimizations are widely used to enhance the average case performance of software, and these techniques are very effective and advance with every compiler version. However, in realtime systems, it is the worst-case performance that matters. While there are techniques that aim at reducing the worst-case execution time (WCET), most of them are specific to certain targets and not implemented in mainstream compilers. In this paper, we present our ongoing work for a generic approach to harness the power of existing compiler optimizations for WCET reduction. Our approach is based on an existing compiler technology called Feedback-Directed Optimization (FDO), which can reduce the execution time of a program by making use of profiling data, and recently became popular due to major improvements. We first introduce a static analysis to efficiently compute a worst-case timing profile based on control flow dominators. During this analysis we perform a minimal number of automated calls to a WCET analyzer. The resulting profile contains basic block and branch execution counts, which then can be used in the regular FDO workflow. Preliminary results show that significant WCET reductions are possible, but depend on many factors that need more investigation.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131625965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Control Flow Vectorization for ARM NEON ARM NEON的控制流矢量化

Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems Pub Date : 2018-05-28 DOI: 10.1145/3207719.3207721

Angela Pohl, Biagio Cosenza, B. Juurlink

{"title":"Control Flow Vectorization for ARM NEON","authors":"Angela Pohl, Biagio Cosenza, B. Juurlink","doi":"10.1145/3207719.3207721","DOIUrl":"https://doi.org/10.1145/3207719.3207721","url":null,"abstract":"Single Instruction Multiple Data (SIMD) extensions in processors enable in-core parallelism for operations on vectors of data. From the compiler perspective, SIMD instructions require automatic techniques to determine how and when it is possible to express computations in terms of vector operations. When this is not possible automatically, a user may still write code in a manner that allows the compiler to deduce that vectorization is possible, or by explicitly define how to vectorize by using intrinsics. This work analyzes the challenge of generating efficient vector instructions by benchmarking 151 loop patterns with three compilers on two SIMD instruction sets. Comparing the vectorization rates for the AVX2 and NEON instruction sets, we observed that the presence of control flow poses a major problem for the vectorization on NEON. We consequently propose a set of solutions to generate efficient vector instructions in the presence of control flow. In particular, we show how to overcome the lack of masked load and store instruction with different code generation strategies. Results show that we enable vectorization of conditional read operations with a minimal overhead, while our technique of atomic select stores achieves a speedup of more than 2x over state of the art for large vectorization factors.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133006708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Exploiting Specification Modularity to Prune the Optimization-Space of Manufacturing Systems 利用规格模块化修剪制造系统的优化空间

Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems Pub Date : 2018-05-28 DOI: 10.1145/3207719.3207728

J. Bastos, S. Stuijk, J. Voeten, R. Schiffelers, H. Corporaal

引用次数: 3

Measuring and Modeling Energy Consumption of Embedded Systems for Optimizing Compilers 基于优化编译器的嵌入式系统能耗测量与建模

Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems Pub Date : 2018-05-28 DOI: 10.1145/3207719.3207729

Mikko Roth, Arno Luppold, H. Falk

引用次数: 10

Restricted Scheduling Windows for Dynamic Fault-Tolerant Primary/Backup Approach-Based Scheduling on Embedded Systems 基于嵌入式系统动态容错主/备份调度方法的受限调度窗口

Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems Pub Date : 2018-05-28 DOI: 10.1145/3207719.3207724

Petr Dobiáš, E. Casseau, O. Sinnen

引用次数: 2

On the Cost of Freedom From Interference in Heterogeneous SoCs 异构soc中免于干扰的代价

Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems Pub Date : 2018-05-28 DOI: 10.1145/3207719.3207735

Björn Forsberg, L. Benini, A. Marongiu

引用次数: 1

Automatic Kernel Fusion for Image Processing DSLs 图像处理dsl的自动核融合

Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems Pub Date : 2018-05-28 DOI: 10.1145/3207719.3207723

Bo Qiao, Oliver Reiche, Frank Hannig, J. Teich

{"title":"Automatic Kernel Fusion for Image Processing DSLs","authors":"Bo Qiao, Oliver Reiche, Frank Hannig, J. Teich","doi":"10.1145/3207719.3207723","DOIUrl":"https://doi.org/10.1145/3207719.3207723","url":null,"abstract":"Programming image processing algorithms on hardware accelerators such as graphics processing units (GPUs) often exhibits a trade-off between software portability and performance portability. Domain-specific languages (DSLs) have proven to be a promising remedy, which enable optimizations and generation of efficient code from a concise, high-level algorithm representation. The scope of this paper is an optimization framework for image processing DSLs in the form of a source-to-source compiler. To cope with the inter-kernel communication bound via global memory for GPU applications, kernel fusion is investigated as a primary optimization technique to improve temporal locality. In order to enable automatic kernel fusion, we analyze the fusibility of each kernel in the algorithm, in terms of data dependencies, resource utilization, and parallelism granularity. By combining the obtained information with the domain-specific knowledge captured in the DSL, a method to automatically fuse the suitable kernels is proposed and integrated into an open source DSL framework. The novel kernel fusion technique is evaluated on two filter-based image processing applications, for which speedups of up to 1.60 are obtained for an NVIDIA Geforce 745 graphics card target.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115949366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Towards a verified Lustre compiler with modular reset 朝着一个经过验证的具有模块化重置的Lustre编译器

Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems Pub Date : 2018-05-28 DOI: 10.1145/3207719.3207732

T. Bourke, Lélio Brun, Marc Pouzet

引用次数: 3