Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems最新文献

PANDORA: a parallelizing approximation-discovery framework (WIP paper) PANDORA:一个并行化近似发现框架(WIP论文)

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems Pub Date : 2019-06-23 DOI: 10.1145/3316482.3326345

G. Stitt, David Campbell

引用次数: 3

A compiler-based approach for GPGPU performance calibration using TLP modulation (WIP paper) 基于编译器的TLP调制GPGPU性能标定方法(WIP论文)

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems Pub Date : 2019-06-23 DOI: 10.1145/3316482.3326343

Yongseung Yu, Seokwon Kang, Yongjun Park

{"title":"A compiler-based approach for GPGPU performance calibration using TLP modulation (WIP paper)","authors":"Yongseung Yu, Seokwon Kang, Yongjun Park","doi":"10.1145/3316482.3326343","DOIUrl":"https://doi.org/10.1145/3316482.3326343","url":null,"abstract":"Modern GPUs are the most successful accelerators as they provide outstanding performance gain by using CUDA or OpenCL programming models. For maximum performance, programmers typically try to maximize the number of thread blocks of target programs, and GPUs also generally attempt to allocate the maximum number of thread blocks to their GPU cores. However, many recent studies have pointed out that simply allocating the maximum number of thread blocks to GPU cores does not always guarantee the best performance, and identifying proper number of thread blocks per GPU core is a major challenge. Despite these studies, most existing architectural techniques cannot be directly applied to current GPU hardware, and the optimal number of thread blocks can vary significantly depending on the target GPU and application characteristics. To solve these problems, this study proposes a just-in-time thread block number adjustment system using CUDA binary modification upon an LLVM compiler framework, referred to as the CTA-Limiter, in order to dynamically maximize GPU performance on real GPUs without reprogramming. The framework gradually reduces the number of concurrent thread blocks of target CUDA workloads using extra shared memory allocation, and compares the execution time with the previous version to automatically identify the optimal number of co-running thread blocks per GPU Core. The results showed meaningful performance improvements, averaging at 30%, 40%, and 44%, in GTX 960, GTX 1050, and GTX 1080 Ti, respectively.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129410883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Raising binaries to LLVM IR with MCTOLL (WIP paper) 使用MCTOLL将二进制文件提升到LLVM IR (WIP论文)

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems Pub Date : 2019-06-23 DOI: 10.1145/3316482.3326354

S. B. Yadavalli, Aaron Smith

{"title":"Raising binaries to LLVM IR with MCTOLL (WIP paper)","authors":"S. B. Yadavalli, Aaron Smith","doi":"10.1145/3316482.3326354","DOIUrl":"https://doi.org/10.1145/3316482.3326354","url":null,"abstract":"The need to analyze and execute binaries from legacy ISAs on new or different ISAs has been addressed in a variety of ways over the past few decades. Solutions using complementary static and dynamic binary translation techniques have been deployed in most real-world situations. As new ISAs are designed and legacy ISAs re-examined, the need for binary translation infrastructure re-emerges, and needs to be re- engineered all over again. Work is in progress with a goal to make such re-engineering efforts easier by using some of the software tools that would irrespectively be developed or available for a new or existing ISA. To that end, this paper presents a static binary raiser that translates binaries to LLVM IR. Native binaries for a new ISA are generated from the raised LLVM IR using the LLVM compiler backend. This technique enables development of a single raiser per legacy ISA, irrespective of the new target ISA. The result of such a raiser can then leverage compiler back-ends of new ISAs, thus simplifying the development of binary translator for the new ISA . This work leverages the existing LLVM infrastructure to implement a static raiser that currently supports raising x64 and Arm32 binaries to LLVM IR. The raiser is built as an LLVM tool – similar to llvm-objdump or clang and does not have any dependencies outside of those needed to build LLVM. This paper describes the phases of the raiser and gives the current status and limitations.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129066169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

ApproxSymate: path sensitive program approximation using symbolic execution 使用符号执行的路径敏感程序近似

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems Pub Date : 2019-06-23 DOI: 10.1145/3316482.3326341

Himeshi De Silva, A. Santosa, Nhut-Minh Ho, W. Wong

{"title":"ApproxSymate: path sensitive program approximation using symbolic execution","authors":"Himeshi De Silva, A. Santosa, Nhut-Minh Ho, W. Wong","doi":"10.1145/3316482.3326341","DOIUrl":"https://doi.org/10.1145/3316482.3326341","url":null,"abstract":"Approximate computing, a technique that forgoes quantifiable output accuracy in favor of performance gains, is useful for improving the energy efficiency of error-resilient software, especially in the embedded setting. The identification of program components that can tolerate error plays a crucial role in balancing the energy vs. accuracy trade off in approximate computing. Manual analysis for approximability is not scalable and therefore automated tools which employ static or dynamic analysis have been proposed. However, static techniques are often coarse in their approximations while dynamic efforts incur high overhead. In this work we present ApproxSymate, a framework for automatically identifying program approximations using symbolic execution. ApproxSymate first statically computes symbolic error expressions for program components and then uses a dynamic sensitivity analysis to compute their approximability. A unique feature of this tool is that it explores the previously not considered dimension of program path for approximation which enables safer transformations. Our evaluation shows that ApproxSymate averages about 96% accuracy in identifying the same approximations found in manually annotated benchmarks, outperforming existing automated techniques.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114832171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

From Java to real-time Java: a model-driven methodology with automated toolchain (invited paper) 从Java到实时Java:带有自动化工具链的模型驱动方法(特邀论文)

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems Pub Date : 2019-06-23 DOI: 10.1145/3316482.3326360

Wanli Chang, Shuai Zhao, Ran Wei, A. Wellings, A. Burns

引用次数: 4

Crash recoverable ARMv8-oriented B+-tree for byte-addressable persistent memory 可崩溃恢复的面向armv8的B+树，用于字节寻址的持久内存

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems Pub Date : 2019-06-23 DOI: 10.1145/3316482.3326358

Chundong Wang, Sudipta Chattopadhyay, Gunavaran Brihadiswarn

{"title":"Crash recoverable ARMv8-oriented B+-tree for byte-addressable persistent memory","authors":"Chundong Wang, Sudipta Chattopadhyay, Gunavaran Brihadiswarn","doi":"10.1145/3316482.3326358","DOIUrl":"https://doi.org/10.1145/3316482.3326358","url":null,"abstract":"The byte-addressable non-volatile memory (NVM) promises persistent memory. Concretely, ARM processors have incorporated architectural supports to utilize NVM. In this paper, we consider tailoring the important B+-tree for NVM operated by a 64-bit ARMv8 processor. We first conduct an empirical study of performance overheads in writing and reading data for a B+-tree with an ARMv8 processor, including the time cost of cache line flushes and memory fences for crash consistency as well as the execution time of binary search compared to that of linear search. We hence identify the key weaknesses in the design of B+-tree with ARMv8 architecture. Accordingly, we develop a new B+-tree variant, namely, crash recoverable ARMv8-oriented B+-tree (Crab-tree). To insert and delete data at runtime, Crab-tree selectively chooses one of two strategies, i.e., copy on write and shifting in place, depending on which one causes less consistency cost to performance. Crab-tree regulates a strict execution order in both strategies and recovers the tree structure in case of crashes. We have evaluated Crab-tree in Raspberry Pi 3 Model B+ with emulated NVM. Experiments show that Crab-tree significantly outperforms state-of-the-art B+-trees designed for persistent memory by up to 2.6x and 3.2x in write and read performances, respectively, with both consistency and scalability achieved.","PeriodicalId":256029,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems","volume":"2001 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125743985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Efficient intermittent computing with differential checkpointing 具有差分检查点的高效间歇计算

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems Pub Date : 2019-06-23 DOI: 10.1145/3316482.3326357

Saad Ahmed, Naveed Anwar Bhatti, Muhammad Hamad Alizai, J. H. Siddiqui, L. Mottola

引用次数: 46

Optimizing tensor contractions for embedded devices with racetrack memory scratch-pads 优化张量收缩的嵌入式设备与赛道记忆刮擦板

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems Pub Date : 2019-06-23 DOI: 10.1145/3316482.3326351

A. Khan, Norman A. Rink, F. Hameed, J. Castrillón

引用次数: 15

IA-graph based inter-app conflicts detection in open IoT systems 开放物联网系统中基于ia图的应用间冲突检测

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems Pub Date : 2019-06-23 DOI: 10.1145/3316482.3326350

Xinyi Li, Lei Zhang, Xipeng Shen

引用次数: 6

BitBench: a benchmark for bitstream computing BitBench：比特流计算基准

Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems Pub Date : 2019-06-23 DOI: 10.1145/3316482.3326355

Kyle Daruwalla, Heng Zhuo, C. Schulz, Mikko H. Lipasti

引用次数: 3