Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation最新文献_第2页

Characterising renaming within OCaml’s module system: theory and implementation OCaml模块系统中重命名的特征:理论与实现

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI: 10.1145/3314221.3314600

R. Rowe, Hugo Férée, S. Thompson, Scott Owens

引用次数: 5

A fast analytical model of fully associative caches 全关联缓存的快速分析模型

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI: 10.1145/3314221.3314606

Tobias Gysi, T. Grosser, Laurin Brandner, T. Hoefler

{"title":"A fast analytical model of fully associative caches","authors":"Tobias Gysi, T. Grosser, Laurin Brandner, T. Hoefler","doi":"10.1145/3314221.3314606","DOIUrl":"https://doi.org/10.1145/3314221.3314606","url":null,"abstract":"While the cost of computation is an easy to understand local property, the cost of data movement on cached architectures depends on global state, does not compose, and is hard to predict. As a result, programmers often fail to consider the cost of data movement. Existing cache models and simulators provide the missing information but are computationally expensive. We present a lightweight cache model for fully associative caches with least recently used (LRU) replacement policy that gives fast and accurate results. We count the cache misses without explicit enumeration of all memory accesses by using symbolic counting techniques twice: 1) to derive the stack distance for each memory access and 2) to count the memory accesses with stack distance larger than the cache size. While this technique seems infeasible in theory, due to non-linearities after the first round of counting, we show that the counting problems are sufficiently linear in practice. Our cache model often computes the results within seconds and contrary to simulation the execution time is mostly problem size independent. Our evaluation measures modeling errors below 0.6% on real hardware. By providing accurate data placement information we enable memory hierarchy aware software development.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115893328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Panthera: holistic memory management for big data processing over hybrid memories Panthera:基于混合内存的大数据处理整体内存管理

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI: 10.1145/3314221.3314650

Chenxi Wang, Huimin Cui, Ting Cao, J. Zigman, Haris Volos, O. Mutlu, Fang Lv, Xiaobing Feng, G. Xu

{"title":"Panthera: holistic memory management for big data processing over hybrid memories","authors":"Chenxi Wang, Huimin Cui, Ting Cao, J. Zigman, Haris Volos, O. Mutlu, Fang Lv, Xiaobing Feng, G. Xu","doi":"10.1145/3314221.3314650","DOIUrl":"https://doi.org/10.1145/3314221.3314650","url":null,"abstract":"Modern data-parallel systems such as Spark rely increasingly on in-memory computing that can significantly improve the efficiency of iterative algorithms. To process real-world datasets, modern data-parallel systems often require extremely large amounts of memory, which are both costly and energy-inefficient. Emerging non-volatile memory (NVM) technologies offers high capacity compared to DRAM and low energy compared to SSDs. Hence, NVMs have the potential to fundamentally change the dichotomy between DRAM and durable storage in Big Data processing. However, most Big Data applications are written in managed languages (e.g., Scala and Java) and executed on top of a managed runtime (e.g., the Java Virtual Machine) that already performs various dimensions of memory management. Supporting hybrid physical memories adds in a new dimension, creating unique challenges in data replacement and migration. This paper proposes Panthera, a semantics-aware, fully automated memory management technique for Big Data processing over hybrid memories. Panthera analyzes user programs on a Big Data system to infer their coarse-grained access patterns, which are then passed down to the Panthera runtime for efficient data placement and migration. For Big Data applications, the coarse-grained data division is accurate enough to guide GC for data layout, which hardly incurs data monitoring and moving overhead. We have implemented Panthera in OpenJDK and Apache Spark. An extensive evaluation with various datasets and applications demonstrates that Panthera reduces energy by 32 – 52% at only a 1 – 9% execution time overhead.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127812685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Usuba: high-throughput and constant-time ciphers, by construction Usuba:高吞吐量和恒定时间的密码，通过构造

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI: 10.1145/3314221.3314636

Darius Mercadier, Pierre-Évariste Dagand

引用次数: 11

A complete formal semantics of x86-64 user-level instruction set architecture x86-64用户级指令集体系结构的完整形式化语义

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI: 10.1145/3314221.3314601

Sandeep Dasgupta, D. Park, T. Kasampalis, Vikram S. Adve, Grigore Roşu

引用次数: 55

Lazy counterfactual symbolic execution 懒惰的反事实的象征性执行

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI: 10.1145/3314221.3314618

William T. Hallahan, Anton Xue, M. Bland, Ranjit Jhala, R. Piskac

引用次数: 15

Compiling KB-sized machine learning models to tiny IoT devices 编译kb大小的机器学习模型到微型物联网设备

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI: 10.1145/3314221.3314597

S. Gopinath, N. Ghanathe, V. Seshadri, Rahul Sharma

{"title":"Compiling KB-sized machine learning models to tiny IoT devices","authors":"S. Gopinath, N. Ghanathe, V. Seshadri, Rahul Sharma","doi":"10.1145/3314221.3314597","DOIUrl":"https://doi.org/10.1145/3314221.3314597","url":null,"abstract":"Recent advances in machine learning (ML) have produced KiloByte-size models that can directly run on constrained IoT devices. This approach avoids expensive communication between IoT devices and the cloud, thereby enabling energy-efficient real-time analytics. However, ML models are expressed typically in floating-point, and IoT hardware typically does not support floating-point. Therefore, running these models on IoT devices requires simulating IEEE-754 floating-point using software, which is very inefficient. We present SeeDot, a domain-specific language to express ML inference algorithms and a compiler that compiles SeeDot programs to fixed-point code that can efficiently run on constrained IoT devices. We propose 1) a novel compilation strategy that reduces the search space for some key parameters used in the fixed-point code, and 2) new efficient implementations of expensive operations. SeeDot compiles state-of-the-art KB-sized models to various microcontrollers and low-end FPGAs. We show that SeeDot outperforms 1) software emulation of floating-point (Arduino), 2) high-bitwidth fixed-point (MATLAB), 3) post-training quantization (TensorFlow-Lite), and 4) floating- and fixed-point FPGA implementations generated using high-level synthesis tools.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116089844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

Sparse record and replay with controlled scheduling 稀疏记录和回放与控制调度

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI: 10.1145/3314221.3314635

Christopher Lidbury, A. Donaldson

{"title":"Sparse record and replay with controlled scheduling","authors":"Christopher Lidbury, A. Donaldson","doi":"10.1145/3314221.3314635","DOIUrl":"https://doi.org/10.1145/3314221.3314635","url":null,"abstract":"Modern applications include many sources of nondeterminism, e.g. due to concurrency, signals, and system calls that interact with the external environment. Finding and reproducing bugs in the presence of this nondeterminism has been the subject of much prior work in three main areas: (1) controlled concurrency-testing, where a custom scheduler replaces the OS scheduler to find subtle bugs; (2) record and replay, where sources of nondeterminism are captured and logged so that a failing execution can be replayed for debugging purposes; and (3) dynamic analysis for the detection of data races. We present a dynamic analysis tool for C++ applications, tsan11rec, which brings these strands of work together by integrating controlled concurrency testing and record and replay into the tsan11 framework for C++11 data race detection. Our novel twist on record and replay is a sparse approach, where the sources of nondeterminism to record can be configured per application. We show that our approach is effective at finding subtle concurrency bugs in small applications; is competitive in terms of performance with the state-of-the-art record and replay tool rr on larger applications; succeeds (due to our sparse approach) in replaying the I/O-intensive Zandronum and QuakeSpasm video games, which are out of scope for rr; but (due to limitations of our sparse approach) cannot faithfully replay applications where memory layout nondeterminism significantly affects application behaviour.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"91 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128017544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Co-optimizing memory-level parallelism and cache-level parallelism 共同优化内存级并行和缓存级并行

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI: 10.1145/3314221.3314599

Xulong Tang, M. Kandemir, Mustafa Karaköy, Meenakshi Arunachalam

{"title":"Co-optimizing memory-level parallelism and cache-level parallelism","authors":"Xulong Tang, M. Kandemir, Mustafa Karaköy, Meenakshi Arunachalam","doi":"10.1145/3314221.3314599","DOIUrl":"https://doi.org/10.1145/3314221.3314599","url":null,"abstract":"Minimizing cache misses has been the traditional goal in optimizing cache performance using compiler based techniques. However, continuously increasing dataset sizes combined with large numbers of cache banks and memory banks connected using on-chip networks in emerging manycores/accelerators makes cache hit–miss latency optimization as important as cache miss rate minimization. In this paper, we propose compiler support that optimizes both the latencies of last-level cache (LLC) hits and the latencies of LLC misses. Our approach tries to achieve this goal by improving the parallelism exhibited by LLC hits and LLC misses. More specifically, it tries to maximize both cache-level parallelism (CLP) and memory-level parallelism (MLP). This paper presents different incarnations of our approach, and evaluates them using a set of 12 multithreaded applications. Our results indicate that (i) optimizing MLP first and CLP later brings, on average, 11.31% performance improvement over an approach that already minimizes the number of LLC misses, and (ii) optimizing CLP first and MLP later brings 9.43% performance improvement. In comparison, balancing MLP and CLP brings 17.32% performance improvement on average.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121672818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Parser-directed fuzzing Parser-directed起毛

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI: 10.1145/3314221.3314651

Björn Mathis, Rahul Gopinath, Michaël Mera, Alexander Kampmann, M. Höschele, A. Zeller

引用次数: 43