Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems最新文献

筛选
英文 中文
OpenMP to CUDA graphs: a compiler-based transformation to enhance the programmability of NVIDIA devices OpenMP到CUDA图形:一个基于编译器的转换,以增强NVIDIA设备的可编程性
Chen Yu, Sara Royuela, E. Quiñones
{"title":"OpenMP to CUDA graphs: a compiler-based transformation to enhance the programmability of NVIDIA devices","authors":"Chen Yu, Sara Royuela, E. Quiñones","doi":"10.1145/3378678.3391881","DOIUrl":"https://doi.org/10.1145/3378678.3391881","url":null,"abstract":"Heterogeneous computing is increasingly being used in a diversity of computing systems, ranging from HPC to the real-time embedded domain, to cope with the performance requirements. Due to the variety of accelerators, e.g., FPGAs, GPUs, the use of high-level parallel programming models is desirable to exploit the performance capabilities of them, while maintaining an adequate productivity level. In that regard, OpenMP is a well-known high-level programming model that incorporates powerful task and accelerator models capable of efficiently exploiting structured and unstructured parallelism in heterogeneous computing. This paper presents a novel compiler transformation technique that automatically transforms OpenMP code into CUDA graphs, combining the benefits of programmability of a high-level programming model such as OpenMP, with the performance benefits of a low-level programming model such as CUDA. Evaluations have been performed on two NVIDIA GPUs from the HPC and embedded domains, i.e., the V100 and the Jetson AGX respectively.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127700191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Programming tensor cores from an image processing DSL 从图像处理DSL编程张量核
Savvas Sioutas, S. Stuijk, T. Basten, L. Somers, H. Corporaal
{"title":"Programming tensor cores from an image processing DSL","authors":"Savvas Sioutas, S. Stuijk, T. Basten, L. Somers, H. Corporaal","doi":"10.1145/3378678.3391880","DOIUrl":"https://doi.org/10.1145/3378678.3391880","url":null,"abstract":"Tensor Cores (TCUs) are specialized units first introduced by NVIDIA in the Volta microarchitecture in order to accelerate matrix multiplications for deep learning and linear algebra workloads. While these units have proved to be capable of providing significant speedups for specific applications, their programmability remains difficult for the average user. In this paper, we extend the Halide DSL and compiler with the ability to utilize these units when generating code for a CUDA based NVIDIA GPGPU. To this end, we introduce a new scheduling directive along with custom lowering passes that automatically transform a Halide AST in order to be able to generate code for the TCUs. We evaluate the generated code and show that it can achieve over 5X speedup compared to Halide manual schedules without TCU support, while it remains within 20% of the NVIDIA cuBLAS implementations for mixed precision GEMM and within 10% of manual CUDA implementations with WMMA intrinsics.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121311006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Configuring loosely time-triggered wireless control software 配置松散的时间触发无线控制软件
Philipp H. Kindt, Sumana Ghosh, S. Chakraborty
{"title":"Configuring loosely time-triggered wireless control software","authors":"Philipp H. Kindt, Sumana Ghosh, S. Chakraborty","doi":"10.1145/3378678.3391888","DOIUrl":"https://doi.org/10.1145/3378678.3391888","url":null,"abstract":"In many wireless control networks, sensor data and controller data are exchanged periodically, which requires periodic packet transmissions between the physical plant and the controller. As an alternative, event-triggered control paradigms imply that data is only exchanged when there are significant changes in the state of the plant, e.g., because of disturbances. This is the nature of many IoT scenarios and requires that a receiving device has to listen to the channel for incoming packets during all times. However, especially in mobile networks, in which all devices are battery-powered, continuous scanning would drain the battery quickly and hence, reception needs to be duty-cycled. When optimizing such duty-cycled operation, significant energy savings are possible using intelligent software-enabled communication scheduling. In this paper, we propose a wireless transmission scheme that supports loosely time-triggered control. When optimizing the scheduling of transmissions and reception windows in the communication protocol, our proposed scheme allows for energy-efficient communication without requiring strict clock-synchronization between the devices. We show that such a scheme is practical and can greatly reduce the energy consumption in event-triggered control applications.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116864095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the implementation and execution of adaptive streaming applications modeled as MADF 以MADF为模型的自适应流应用程序的实现和执行
Sobhan Niknam, Peng Wang, T. Stefanov
{"title":"On the implementation and execution of adaptive streaming applications modeled as MADF","authors":"Sobhan Niknam, Peng Wang, T. Stefanov","doi":"10.1145/3378678.3391876","DOIUrl":"https://doi.org/10.1145/3378678.3391876","url":null,"abstract":"It has been shown that the mode-aware dataflow (MADF) is an advantageous analysis model for adaptive streaming applications. However, no attention has been paid on how to implement and execute an application, modeled and analyzed with the MADF model, on a Multi-Processor System-on-Chip, such that the properties of the analysis model are preserved. Therefore, in this paper, we consider this matter and propose a generic parallel implementation and execution approach for adaptive streaming applications modeled with MADF. Our approach can be easily realized on top of existing operating systems while supporting the utilization of a wider range of schedules. In particular, we demonstrate our approach on LITMUSRT as one of the existing real-time extensions of the Linux kernel. Finally, to show the practical applicability of our approach and its conformity to the analysis model, we present a case study using a real-life adaptive streaming application.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125163688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-layer approaches for improving the dependability of deep learning systems 提高深度学习系统可靠性的跨层方法
Muhammad Abdullah Hanif, L. Hoang, M. Shafique
{"title":"Cross-layer approaches for improving the dependability of deep learning systems","authors":"Muhammad Abdullah Hanif, L. Hoang, M. Shafique","doi":"10.1145/3378678.3391884","DOIUrl":"https://doi.org/10.1145/3378678.3391884","url":null,"abstract":"Deep Neural Networks (DNNs) - the state-of-the-art computational models for many Artificial Intelligence (AI) applications - are inherently compute and resource-intensive and, hence, cannot exploit traditional redundancy-based fault mitigation techniques for enhancing the dependability of DNN-based systems. Therefore, there is a dire need to search for alternate methods that can improve their reliability without high expenditure of resources by exploiting the intrinsic characteristics of these networks. In this paper, we present cross-layer approaches that, based on the intrinsic characteristics of DNNs, employ software and hardware-level modifications for improving the resilience of DNN-based systems to hardware-level faults, e.g., soft errors and permanent faults.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124356442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scheduling of moldable fork-join tasks with inter- and intra-task communications 具有任务间和任务内通信的可塑分叉连接任务的调度
Hiroki Nishikawa, Kaname Shimada, Ittetsu Taniguchi, H. Tomiyama
{"title":"Scheduling of moldable fork-join tasks with inter- and intra-task communications","authors":"Hiroki Nishikawa, Kaname Shimada, Ittetsu Taniguchi, H. Tomiyama","doi":"10.1145/3378678.3391875","DOIUrl":"https://doi.org/10.1145/3378678.3391875","url":null,"abstract":"This paper proposes scheduling techniques for moldable fork-join tasks on multicore architecture. The proposed techniques decide the number of cores and execution start time for each task during scheduling and mapping, with taking into account inter- and intra-task communications. The proposed techniques based on integer programming formulation aim at minimization of the overall schedule length. Experimental results are compared with the state-of-the-art techniques.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"240 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121686119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A secure hardware-software solution based on RISC-V, logic locking and microkernel 基于RISC-V、逻辑锁定和微内核的安全软硬件解决方案
Dominik Sisejkovic, Farhad Merchant, Lennart M. Reimann, R. Leupers, M. Giacometti, Sascha Kegreiss
{"title":"A secure hardware-software solution based on RISC-V, logic locking and microkernel","authors":"Dominik Sisejkovic, Farhad Merchant, Lennart M. Reimann, R. Leupers, M. Giacometti, Sascha Kegreiss","doi":"10.1145/3378678.3391886","DOIUrl":"https://doi.org/10.1145/3378678.3391886","url":null,"abstract":"In this paper we present the first generation of a secure platform developed by following a security-by-design approach. The security of the platform is built on top of two pillars: a secured hardware design flow and a secure microkernel. The hardware design is protected against the insertion of hardware Trojans during the production phase through netlist obfuscation provided by logic locking. The software stack is based on a trustworthy and verified microkernel. Moreover, the system is expected to work in an environment which does not allow physical access to the device. Therefore, on-the-field attacks are only possible via software. We present a solution whose security has been achieved by relying on simple and open hardware and software solutions, namely a RISC-V processor core, open-source peripherals and an seL4--based operating system.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114430558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Reviewing inference performance of state-of-the-art deep learning frameworks 回顾最先进的深度学习框架的推理性能
Berk Ulker, S. Stuijk, H. Corporaal, R. Wijnhoven
{"title":"Reviewing inference performance of state-of-the-art deep learning frameworks","authors":"Berk Ulker, S. Stuijk, H. Corporaal, R. Wijnhoven","doi":"10.1145/3378678.3391882","DOIUrl":"https://doi.org/10.1145/3378678.3391882","url":null,"abstract":"Deep learning models have replaced conventional methods for machine learning tasks. Efficient inference on edge devices with limited resources is key for broader deployment. In this work, we focus on the tool selection challenge for inference deployment. We present an extensive evaluation of the inference performance of deep learning software tools using state-of-the-art CNN architectures for multiple hardware platforms. We benchmark these hardware-software pairs for a broad range of network architectures, inference batch sizes, and floating-point precision, focusing on latency and throughput. Our results reveal interesting combinations for optimal tool selection, resulting in different optima when considering minimum latency and maximum throughput.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121853709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Real-time audio processing for hearing aids using a model-based bayesian inference framework 基于模型的贝叶斯推理框架的助听器实时音频处理
M. Roa-Villescas, B. Vries, S. Stuijk, H. Corporaal
{"title":"Real-time audio processing for hearing aids using a model-based bayesian inference framework","authors":"M. Roa-Villescas, B. Vries, S. Stuijk, H. Corporaal","doi":"10.1145/3378678.3397528","DOIUrl":"https://doi.org/10.1145/3378678.3397528","url":null,"abstract":"Development of hearing aid (HA) signal processing algorithms entails an iterative process between two design steps, namely algorithm development and the embedded implementation. Algorithm designers favor high-level programming languages for several reasons including higher productivity, code readability and, perhaps most importantly, availability of state-of-the-art signal processing frameworks that open new research directions. Embedded software, on the other hand, is preferably implemented using a low-level programming language to allow finer control of the hardware, an essential trait in real-time processing applications. In this paper we present a technique that allows deploying DSP algorithms written in Julia, a modern high-level programming language, on a real-time HA processing platform known as openMHA. We demonstrate this technique by using a model-based Bayesian inference framework to perform real-time audio processing.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123950975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exploration of GPU sharing policies under GEMM workloads GEMM工作负载下GPU共享策略的探索
Ioannis Oroutzoglou, Dimosthenis Masouros, Konstantina Koliogeorgi, S. Xydis, D. Soudris
{"title":"Exploration of GPU sharing policies under GEMM workloads","authors":"Ioannis Oroutzoglou, Dimosthenis Masouros, Konstantina Koliogeorgi, S. Xydis, D. Soudris","doi":"10.1145/3378678.3391887","DOIUrl":"https://doi.org/10.1145/3378678.3391887","url":null,"abstract":"Lately, cloud computing has seen explosive growth, due to the flexibility and scalability it offers. The ever-increasing computational demands, especially from the machine learning domain, have forced cloud operators to enhance their infrastructure with acceleration devices, such as General-Purpose (GP)GPUs or FPGAs. Even though multi-tenancy has been widely examined for conventional CPUs, this is not the case for accelerators. Current solutions support \"one accelerator per user\" schemes, which can lead to both under-utilization and starvation of available resources. In this work, we analyze the potentials of GPU sharing inside data-center environments. We investigate how several architectural features affect the performance of GPUs under different multi-tenant stressing scenarios. We compare CUDA MPS with the native, default CUDA scheduler and also with Vinetalk, a research framework providing GPU sharing capabilities. Experimental results show that NVIDIA's MPS achieves the best performance in multi-application scenarios, specifically up to X4.5 and X11.2 compared to native CUDA scheduler and Vinetalk respectively.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"604 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131427943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信