Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems最新文献

筛选
英文 中文
PANDORA: a parallelizing approximation-discovery framework (WIP paper) PANDORA:一个并行化近似发现框架(WIP论文)
G. Stitt, David Campbell
{"title":"PANDORA: a parallelizing approximation-discovery framework (WIP paper)","authors":"G. Stitt, David Campbell","doi":"10.1145/3316482.3326345","DOIUrl":"https://doi.org/10.1145/3316482.3326345","url":null,"abstract":"In this paper, we introduce PANDORA---a framework that complements existing parallelizing compilers by automatically discovering application- and architecture-specialized approximations. We demonstrate that PANDORA creates approximations that extract massive amounts of parallelism from inherently sequential code by eliminating loop-carried dependencies---a long-time goal of the compiler research community. Compared to exact parallel baselines, preliminary results show speedups ranging from 2.3x to 81x with acceptable error for many usage scenarios.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124905733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A compiler-based approach for GPGPU performance calibration using TLP modulation (WIP paper) 基于编译器的TLP调制GPGPU性能标定方法(WIP论文)
Yongseung Yu, Seokwon Kang, Yongjun Park
{"title":"A compiler-based approach for GPGPU performance calibration using TLP modulation (WIP paper)","authors":"Yongseung Yu, Seokwon Kang, Yongjun Park","doi":"10.1145/3316482.3326343","DOIUrl":"https://doi.org/10.1145/3316482.3326343","url":null,"abstract":"Modern GPUs are the most successful accelerators as they provide outstanding performance gain by using CUDA or OpenCL programming models. For maximum performance, programmers typically try to maximize the number of thread blocks of target programs, and GPUs also generally attempt to allocate the maximum number of thread blocks to their GPU cores. However, many recent studies have pointed out that simply allocating the maximum number of thread blocks to GPU cores does not always guarantee the best performance, and identifying proper number of thread blocks per GPU core is a major challenge. Despite these studies, most existing architectural techniques cannot be directly applied to current GPU hardware, and the optimal number of thread blocks can vary significantly depending on the target GPU and application characteristics. To solve these problems, this study proposes a just-in-time thread block number adjustment system using CUDA binary modification upon an LLVM compiler framework, referred to as the CTA-Limiter, in order to dynamically maximize GPU performance on real GPUs without reprogramming. The framework gradually reduces the number of concurrent thread blocks of target CUDA workloads using extra shared memory allocation, and compares the execution time with the previous version to automatically identify the optimal number of co-running thread blocks per GPU Core. The results showed meaningful performance improvements, averaging at 30%, 40%, and 44%, in GTX 960, GTX 1050, and GTX 1080 Ti, respectively.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129410883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Raising binaries to LLVM IR with MCTOLL (WIP paper) 使用MCTOLL将二进制文件提升到LLVM IR (WIP论文)
S. B. Yadavalli, Aaron Smith
{"title":"Raising binaries to LLVM IR with MCTOLL (WIP paper)","authors":"S. B. Yadavalli, Aaron Smith","doi":"10.1145/3316482.3326354","DOIUrl":"https://doi.org/10.1145/3316482.3326354","url":null,"abstract":"The need to analyze and execute binaries from legacy ISAs on new or different ISAs has been addressed in a variety of ways over the past few decades. Solutions using complementary static and dynamic binary translation techniques have been deployed in most real-world situations. As new ISAs are designed and legacy ISAs re-examined, the need for binary translation infrastructure re-emerges, and needs to be re- engineered all over again. Work is in progress with a goal to make such re-engineering efforts easier by using some of the software tools that would irrespectively be developed or available for a new or existing ISA. To that end, this paper presents a static binary raiser that translates binaries to LLVM IR. Native binaries for a new ISA are generated from the raised LLVM IR using the LLVM compiler backend. This technique enables development of a single raiser per legacy ISA, irrespective of the new target ISA. The result of such a raiser can then leverage compiler back-ends of new ISAs, thus simplifying the development of binary translator for the new ISA . This work leverages the existing LLVM infrastructure to implement a static raiser that currently supports raising x64 and Arm32 binaries to LLVM IR. The raiser is built as an LLVM tool – similar to llvm-objdump or clang and does not have any dependencies outside of those needed to build LLVM. This paper describes the phases of the raiser and gives the current status and limitations.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129066169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
ApproxSymate: path sensitive program approximation using symbolic execution 使用符号执行的路径敏感程序近似
Himeshi De Silva, A. Santosa, Nhut-Minh Ho, W. Wong
{"title":"ApproxSymate: path sensitive program approximation using symbolic execution","authors":"Himeshi De Silva, A. Santosa, Nhut-Minh Ho, W. Wong","doi":"10.1145/3316482.3326341","DOIUrl":"https://doi.org/10.1145/3316482.3326341","url":null,"abstract":"Approximate computing, a technique that forgoes quantifiable output accuracy in favor of performance gains, is useful for improving the energy efficiency of error-resilient software, especially in the embedded setting. The identification of program components that can tolerate error plays a crucial role in balancing the energy vs. accuracy trade off in approximate computing. Manual analysis for approximability is not scalable and therefore automated tools which employ static or dynamic analysis have been proposed. However, static techniques are often coarse in their approximations while dynamic efforts incur high overhead. In this work we present ApproxSymate, a framework for automatically identifying program approximations using symbolic execution. ApproxSymate first statically computes symbolic error expressions for program components and then uses a dynamic sensitivity analysis to compute their approximability. A unique feature of this tool is that it explores the previously not considered dimension of program path for approximation which enables safer transformations. Our evaluation shows that ApproxSymate averages about 96% accuracy in identifying the same approximations found in manually annotated benchmarks, outperforming existing automated techniques.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114832171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
From Java to real-time Java: a model-driven methodology with automated toolchain (invited paper) 从Java到实时Java:带有自动化工具链的模型驱动方法(特邀论文)
Wanli Chang, Shuai Zhao, Ran Wei, A. Wellings, A. Burns
{"title":"From Java to real-time Java: a model-driven methodology with automated toolchain (invited paper)","authors":"Wanli Chang, Shuai Zhao, Ran Wei, A. Wellings, A. Burns","doi":"10.1145/3316482.3326360","DOIUrl":"https://doi.org/10.1145/3316482.3326360","url":null,"abstract":"Real-time systems are receiving increasing attention with the emerging application scenarios that are safety-critical, complex in functionality, high on timing-related performance requirements, and cost-sensitive, such as autonomous vehicles. Development of real-time systems is error-prone and highly dependent on the sophisticated domain expertise, making it a costly process. There is a trend of the existing software without the real-time notion being re-developed to realise real-time features, e.g., in the big data technology. This paper utilises the principles of model-driven engineering (MDE) and proposes the first methodology that automatically converts standard time-sharing Java applications to real-time Java applications. It opens up a new research direction on development automation of real-time programming languages and inspires many research questions that can be jointly investigated by the embedded systems, programming languages as well as MDE communities.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115230319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Crash recoverable ARMv8-oriented B+-tree for byte-addressable persistent memory 可崩溃恢复的面向armv8的B+树,用于字节寻址的持久内存
Chundong Wang, Sudipta Chattopadhyay, Gunavaran Brihadiswarn
{"title":"Crash recoverable ARMv8-oriented B+-tree for byte-addressable persistent memory","authors":"Chundong Wang, Sudipta Chattopadhyay, Gunavaran Brihadiswarn","doi":"10.1145/3316482.3326358","DOIUrl":"https://doi.org/10.1145/3316482.3326358","url":null,"abstract":"The byte-addressable non-volatile memory (NVM) promises persistent memory. Concretely, ARM processors have incorporated architectural supports to utilize NVM. In this paper, we consider tailoring the important B+-tree for NVM operated by a 64-bit ARMv8 processor. We first conduct an empirical study of performance overheads in writing and reading data for a B+-tree with an ARMv8 processor, including the time cost of cache line flushes and memory fences for crash consistency as well as the execution time of binary search compared to that of linear search. We hence identify the key weaknesses in the design of B+-tree with ARMv8 architecture. Accordingly, we develop a new B+-tree variant, namely, crash recoverable ARMv8-oriented B+-tree (Crab-tree). To insert and delete data at runtime, Crab-tree selectively chooses one of two strategies, i.e., copy on write and shifting in place, depending on which one causes less consistency cost to performance. Crab-tree regulates a strict execution order in both strategies and recovers the tree structure in case of crashes. We have evaluated Crab-tree in Raspberry Pi 3 Model B+ with emulated NVM. Experiments show that Crab-tree significantly outperforms state-of-the-art B+-trees designed for persistent memory by up to 2.6x and 3.2x in write and read performances, respectively, with both consistency and scalability achieved.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"2001 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125743985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Efficient intermittent computing with differential checkpointing 具有差分检查点的高效间歇计算
Saad Ahmed, Naveed Anwar Bhatti, Muhammad Hamad Alizai, J. H. Siddiqui, L. Mottola
{"title":"Efficient intermittent computing with differential checkpointing","authors":"Saad Ahmed, Naveed Anwar Bhatti, Muhammad Hamad Alizai, J. H. Siddiqui, L. Mottola","doi":"10.1145/3316482.3326357","DOIUrl":"https://doi.org/10.1145/3316482.3326357","url":null,"abstract":"Embedded devices running on ambient energy perform computations intermittently, depending upon energy availability. System support ensures forward progress of programs through state checkpointing in non-volatile memory. Checkpointing is, however, expensive in energy and adds to execution times. To reduce this overhead, we present DICE, a system design that efficiently achieves differential checkpointing in intermittent computing. Distinctive traits of DICE are its software-only nature and its ability to only operate in volatile main memory to determine differentials. DICE works with arbitrary programs using automatic code instrumentation, thus requiring no programmer intervention, and can be integrated with both reactive (Hibernus) or proactive (MementOS, HarvOS) checkpointing systems. By reducing the cost of checkpoints, performance markedly improves. For example, using DICE, Hibernus requires one order of magnitude shorter time to complete a fixed workload in real-world settings.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123337646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Optimizing tensor contractions for embedded devices with racetrack memory scratch-pads 优化张量收缩的嵌入式设备与赛道记忆刮擦板
A. Khan, Norman A. Rink, F. Hameed, J. Castrillón
{"title":"Optimizing tensor contractions for embedded devices with racetrack memory scratch-pads","authors":"A. Khan, Norman A. Rink, F. Hameed, J. Castrillón","doi":"10.1145/3316482.3326351","DOIUrl":"https://doi.org/10.1145/3316482.3326351","url":null,"abstract":"Tensor contraction is a fundamental operation in many algorithms with a plethora of applications ranging from quantum chemistry over fluid dynamics and image processing to machine learning. The performance of tensor computations critically depends on the efficient utilization of on-chip memories. In the context of low-power embedded devices, efficient management of the memory space becomes even more crucial, in order to meet energy constraints. This work aims at investigating strategies for performance- and energy-efficient tensor contractions on embedded systems, using racetrack memory (RTM)-based scratch-pad memory (SPM). Compiler optimizations such as the loop access order and data layout transformations paired with architectural optimizations such as prefetching and preshifting are employed to reduce the shifting overhead in RTMs. Experimental results demonstrate that the proposed optimizations improve the SPM performance and energy consumption by 24% and 74% respectively compared to an iso-capacity SRAM.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115224804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
IA-graph based inter-app conflicts detection in open IoT systems 开放物联网系统中基于ia图的应用间冲突检测
Xinyi Li, Lei Zhang, Xipeng Shen
{"title":"IA-graph based inter-app conflicts detection in open IoT systems","authors":"Xinyi Li, Lei Zhang, Xipeng Shen","doi":"10.1145/3316482.3326350","DOIUrl":"https://doi.org/10.1145/3316482.3326350","url":null,"abstract":"This paper tackles the problem of detecting potential conflicts among independently developed apps that are to be installed into an open Internet of Things (IoT) environment. It provides a new set of definitions and categorizations of the conflicts to more precisely characterize the nature of the problem, and employs a graph representation (named IA Graph) for formally representing IoT controls and inter-app interplays. It provides an efficient conflicts detection algorithm implemented on a SmartThings compiler and shows significantly improved efficacy over prior solutions.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121636293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
BitBench: a benchmark for bitstream computing BitBench:比特流计算基准
Kyle Daruwalla, Heng Zhuo, C. Schulz, Mikko H. Lipasti
{"title":"BitBench: a benchmark for bitstream computing","authors":"Kyle Daruwalla, Heng Zhuo, C. Schulz, Mikko H. Lipasti","doi":"10.1145/3316482.3326355","DOIUrl":"https://doi.org/10.1145/3316482.3326355","url":null,"abstract":"With the recent increase in ultra-low power applications, researchers are investigating alternative architectures that can operate on streaming input data. These target use cases require complex algorithms that must be evaluated under a real-time deadline, but also satisfy the strict available power budget. Stochastic computing (SC) is an example of an alternative paradigm where the data is represented as single bitstreams, allowing designers to implement operations such as multiplication using a simple AND gate. Consequently, the resulting design is both low area and low power. Similarly, traditional digital filters can take advantage of streaming inputs to effectively choose coefficients, resulting in a low cost implementation. In this work, we construct six key algorithms to characterize bitstream computing. We present these algorithms as a new benchmark suite: BitBench.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126728898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信