{"title":"Characterising renaming within OCaml’s module system: theory and implementation","authors":"R. Rowe, Hugo Férée, S. Thompson, Scott Owens","doi":"10.1145/3314221.3314600","DOIUrl":"https://doi.org/10.1145/3314221.3314600","url":null,"abstract":"We present an abstract, set-theoretic denotational semantics for a significant subset of OCaml and its module system, allowing to reason about the correctness of renaming value bindings. Our semantics captures information about the binding structure of programs, as well as about which declarations are related by the use of different language constructs (e.g. functors, module types and module constraints). Correct renamings are precisely those that preserve this structure. We show that our abstract semantics is sound with respect to a (domain-theoretic) denotational model of the operational behaviour of programs, and that it allows us to prove various high-level, intuitive properties of renamings. This formal framework has been implemented in a prototype refactoring tool for OCaml that performs renaming.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115295577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Gysi, T. Grosser, Laurin Brandner, T. Hoefler
{"title":"A fast analytical model of fully associative caches","authors":"Tobias Gysi, T. Grosser, Laurin Brandner, T. Hoefler","doi":"10.1145/3314221.3314606","DOIUrl":"https://doi.org/10.1145/3314221.3314606","url":null,"abstract":"While the cost of computation is an easy to understand local property, the cost of data movement on cached architectures depends on global state, does not compose, and is hard to predict. As a result, programmers often fail to consider the cost of data movement. Existing cache models and simulators provide the missing information but are computationally expensive. We present a lightweight cache model for fully associative caches with least recently used (LRU) replacement policy that gives fast and accurate results. We count the cache misses without explicit enumeration of all memory accesses by using symbolic counting techniques twice: 1) to derive the stack distance for each memory access and 2) to count the memory accesses with stack distance larger than the cache size. While this technique seems infeasible in theory, due to non-linearities after the first round of counting, we show that the counting problems are sufficiently linear in practice. Our cache model often computes the results within seconds and contrary to simulation the execution time is mostly problem size independent. Our evaluation measures modeling errors below 0.6% on real hardware. By providing accurate data placement information we enable memory hierarchy aware software development.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115893328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenxi Wang, Huimin Cui, Ting Cao, J. Zigman, Haris Volos, O. Mutlu, Fang Lv, Xiaobing Feng, G. Xu
{"title":"Panthera: holistic memory management for big data processing over hybrid memories","authors":"Chenxi Wang, Huimin Cui, Ting Cao, J. Zigman, Haris Volos, O. Mutlu, Fang Lv, Xiaobing Feng, G. Xu","doi":"10.1145/3314221.3314650","DOIUrl":"https://doi.org/10.1145/3314221.3314650","url":null,"abstract":"Modern data-parallel systems such as Spark rely increasingly on in-memory computing that can significantly improve the efficiency of iterative algorithms. To process real-world datasets, modern data-parallel systems often require extremely large amounts of memory, which are both costly and energy-inefficient. Emerging non-volatile memory (NVM) technologies offers high capacity compared to DRAM and low energy compared to SSDs. Hence, NVMs have the potential to fundamentally change the dichotomy between DRAM and durable storage in Big Data processing. However, most Big Data applications are written in managed languages (e.g., Scala and Java) and executed on top of a managed runtime (e.g., the Java Virtual Machine) that already performs various dimensions of memory management. Supporting hybrid physical memories adds in a new dimension, creating unique challenges in data replacement and migration. This paper proposes Panthera, a semantics-aware, fully automated memory management technique for Big Data processing over hybrid memories. Panthera analyzes user programs on a Big Data system to infer their coarse-grained access patterns, which are then passed down to the Panthera runtime for efficient data placement and migration. For Big Data applications, the coarse-grained data division is accurate enough to guide GC for data layout, which hardly incurs data monitoring and moving overhead. We have implemented Panthera in OpenJDK and Apache Spark. An extensive evaluation with various datasets and applications demonstrates that Panthera reduces energy by 32 – 52% at only a 1 – 9% execution time overhead.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127812685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Usuba: high-throughput and constant-time ciphers, by construction","authors":"Darius Mercadier, Pierre-Évariste Dagand","doi":"10.1145/3314221.3314636","DOIUrl":"https://doi.org/10.1145/3314221.3314636","url":null,"abstract":"Cryptographic primitives are subject to diverging imperatives. Functional correctness and auditability pushes for the use of a high-level programming language. Performance and the threat of timing attacks push for using no more abstract than an assembler to exploit (or avoid!) the micro-architectural features of a given machine. We believe that a suitable programming language can reconcile both views and actually improve on the state of the art of both. Usuba is an opinionated dataflow programming language in which block ciphers become so simple as to be “obviously correct” and whose types document and enforce valid parallelization strategies at the granularity of individual bits. Its optimizing compiler, Usubac, produces high-throughput, constant-time implementations performing on par with hand-tuned reference implementations. The cornerstone of our approach is a systematization and generalization of bitslicing, an implementation trick frequently used by cryptographers.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116971990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sandeep Dasgupta, D. Park, T. Kasampalis, Vikram S. Adve, Grigore Roşu
{"title":"A complete formal semantics of x86-64 user-level instruction set architecture","authors":"Sandeep Dasgupta, D. Park, T. Kasampalis, Vikram S. Adve, Grigore Roşu","doi":"10.1145/3314221.3314601","DOIUrl":"https://doi.org/10.1145/3314221.3314601","url":null,"abstract":"We present the most complete and thoroughly tested formal semantics of x86-64 to date. Our semantics faithfully formalizes all the non-deprecated, sequential user-level instructions of the x86-64 Haswell instruction set architecture. This totals 3155 instruction variants, corresponding to 774 mnemonics. The semantics is fully executable and has been tested against more than 7,000 instruction-level test cases and the GCC torture test suite. This extensive testing paid off, revealing bugs in both the x86-64 reference manual and other existing semantics. We also illustrate potential applications of our semantics in different formal analyses, and discuss how it can be useful for processor verification.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131164306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William T. Hallahan, Anton Xue, M. Bland, Ranjit Jhala, R. Piskac
{"title":"Lazy counterfactual symbolic execution","authors":"William T. Hallahan, Anton Xue, M. Bland, Ranjit Jhala, R. Piskac","doi":"10.1145/3314221.3314618","DOIUrl":"https://doi.org/10.1145/3314221.3314618","url":null,"abstract":"We present counterfactual symbolic execution, a new approach that produces counterexamples that localize the causes of failure of static verification. First, we develop a notion of symbolic weak head normal form and use it to define lazy symbolic execution reduction rules for non-strict languages like Haskell. Second, we introduce counterfactual branching, a new method to identify places where verification fails due to imprecise specifications (as opposed to incorrect code). Third, we show how to use counterfactual symbolic execution to localize refinement type errors, by translating refinement types into assertions. We implement our approach in a new Haskell symbolic execution engine, G2, and evaluate it on a corpus of 7550 errors gathered from users of the LiquidHaskell refinement type system. We show that for 97.7% of these errors, G2 is able to quickly find counterexamples that show how the code or specifications must be fixed to enable verification.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134571979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Gopinath, N. Ghanathe, V. Seshadri, Rahul Sharma
{"title":"Compiling KB-sized machine learning models to tiny IoT devices","authors":"S. Gopinath, N. Ghanathe, V. Seshadri, Rahul Sharma","doi":"10.1145/3314221.3314597","DOIUrl":"https://doi.org/10.1145/3314221.3314597","url":null,"abstract":"Recent advances in machine learning (ML) have produced KiloByte-size models that can directly run on constrained IoT devices. This approach avoids expensive communication between IoT devices and the cloud, thereby enabling energy-efficient real-time analytics. However, ML models are expressed typically in floating-point, and IoT hardware typically does not support floating-point. Therefore, running these models on IoT devices requires simulating IEEE-754 floating-point using software, which is very inefficient. We present SeeDot, a domain-specific language to express ML inference algorithms and a compiler that compiles SeeDot programs to fixed-point code that can efficiently run on constrained IoT devices. We propose 1) a novel compilation strategy that reduces the search space for some key parameters used in the fixed-point code, and 2) new efficient implementations of expensive operations. SeeDot compiles state-of-the-art KB-sized models to various microcontrollers and low-end FPGAs. We show that SeeDot outperforms 1) software emulation of floating-point (Arduino), 2) high-bitwidth fixed-point (MATLAB), 3) post-training quantization (TensorFlow-Lite), and 4) floating- and fixed-point FPGA implementations generated using high-level synthesis tools.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116089844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse record and replay with controlled scheduling","authors":"Christopher Lidbury, A. Donaldson","doi":"10.1145/3314221.3314635","DOIUrl":"https://doi.org/10.1145/3314221.3314635","url":null,"abstract":"Modern applications include many sources of nondeterminism, e.g. due to concurrency, signals, and system calls that interact with the external environment. Finding and reproducing bugs in the presence of this nondeterminism has been the subject of much prior work in three main areas: (1) controlled concurrency-testing, where a custom scheduler replaces the OS scheduler to find subtle bugs; (2) record and replay, where sources of nondeterminism are captured and logged so that a failing execution can be replayed for debugging purposes; and (3) dynamic analysis for the detection of data races. We present a dynamic analysis tool for C++ applications, tsan11rec, which brings these strands of work together by integrating controlled concurrency testing and record and replay into the tsan11 framework for C++11 data race detection. Our novel twist on record and replay is a sparse approach, where the sources of nondeterminism to record can be configured per application. We show that our approach is effective at finding subtle concurrency bugs in small applications; is competitive in terms of performance with the state-of-the-art record and replay tool rr on larger applications; succeeds (due to our sparse approach) in replaying the I/O-intensive Zandronum and QuakeSpasm video games, which are out of scope for rr; but (due to limitations of our sparse approach) cannot faithfully replay applications where memory layout nondeterminism significantly affects application behaviour.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"91 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128017544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xulong Tang, M. Kandemir, Mustafa Karaköy, Meenakshi Arunachalam
{"title":"Co-optimizing memory-level parallelism and cache-level parallelism","authors":"Xulong Tang, M. Kandemir, Mustafa Karaköy, Meenakshi Arunachalam","doi":"10.1145/3314221.3314599","DOIUrl":"https://doi.org/10.1145/3314221.3314599","url":null,"abstract":"Minimizing cache misses has been the traditional goal in optimizing cache performance using compiler based techniques. However, continuously increasing dataset sizes combined with large numbers of cache banks and memory banks connected using on-chip networks in emerging manycores/accelerators makes cache hit–miss latency optimization as important as cache miss rate minimization. In this paper, we propose compiler support that optimizes both the latencies of last-level cache (LLC) hits and the latencies of LLC misses. Our approach tries to achieve this goal by improving the parallelism exhibited by LLC hits and LLC misses. More specifically, it tries to maximize both cache-level parallelism (CLP) and memory-level parallelism (MLP). This paper presents different incarnations of our approach, and evaluates them using a set of 12 multithreaded applications. Our results indicate that (i) optimizing MLP first and CLP later brings, on average, 11.31% performance improvement over an approach that already minimizes the number of LLC misses, and (ii) optimizing CLP first and MLP later brings 9.43% performance improvement. In comparison, balancing MLP and CLP brings 17.32% performance improvement on average.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121672818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Björn Mathis, Rahul Gopinath, Michaël Mera, Alexander Kampmann, M. Höschele, A. Zeller
{"title":"Parser-directed fuzzing","authors":"Björn Mathis, Rahul Gopinath, Michaël Mera, Alexander Kampmann, M. Höschele, A. Zeller","doi":"10.1145/3314221.3314651","DOIUrl":"https://doi.org/10.1145/3314221.3314651","url":null,"abstract":"To be effective, software test generation needs to well cover the space of possible inputs. Traditional fuzzing generates large numbers of random inputs, which however are unlikely to contain keywords and other specific inputs of non-trivial input languages. Constraint-based test generation solves conditions of paths leading to uncovered code, but fails on programs with complex input conditions because of path explosion. In this paper, we present a test generation technique specifically directed at input parsers. We systematically produce inputs for the parser and track comparisons made; after every rejection, we satisfy the comparisons leading to rejection. This approach effectively covers the input space: Evaluated on five subjects, from CSV files to JavaScript, our pFuzzer prototype covers more tokens than both random-based and constraint-based approaches, while requiring no symbolic analysis and far fewer tests than random fuzzers.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115061757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}