A. Ahmadi, Majid Daliri, A. K. Goharshady, Andreas Pavlogiannis
{"title":"Efficient approximations for cache-conscious data placement","authors":"A. Ahmadi, Majid Daliri, A. K. Goharshady, Andreas Pavlogiannis","doi":"10.1145/3519939.3523436","DOIUrl":"https://doi.org/10.1145/3519939.3523436","url":null,"abstract":"There is a huge and growing gap between the speed of accesses to data stored in main memory vs cache. Thus, cache misses account for a significant portion of runtime overhead in virtually every program and minimizing them has been an active research topic for decades. The primary and most classical formal model for this problem is that of Cache-conscious Data Placement (CDP): given a commutative cache with constant capacity k and a sequence Σ of accesses to data elements, the goal is to map each data element to a cache line such that the total number of cache misses over Σ is minimized. Note that we are considering an offline single-threaded setting in which Σ is known a priori. CDP has been widely studied since the 1990s. In POPL 2002, Petrank and Rawitz proved a notoriously strong hardness result: They showed that for every k ≥ 3, CDP is not only NP-hard but also hard-to-approximate within any non-trivial factor unless P=NP. As such, all subsequent works gave up on theoretical improvements and instead focused on heuristic algorithms with no theoretical guarantees. In this work, we present the first-ever positive theoretical result for CDP. The fundamental idea behind our approach is that real-world instances of the problem have specific structural properties that can be exploited to obtain efficient algorithms with strong approximation guarantees. Specifically, the access graphs corresponding to many real-world access sequences are sparse and tree-like. This was already well-known in the community but has only been used to design heuristics without guarantees. In contrast, we provide fixed-parameter tractable algorithms that provably approximate the optimal number of cache misses within any factor 1 + є, assuming that the access graph of a specific degree dє is sparse, i.e. sparser real-world instances lead to tighter approximations. Our theoretical results are accompanied by an experimental evaluation in which our approach outperforms past heuristics over small caches with a handful of lines. However, the approach cannot currently handle large real-world caches and making it scalable in practice is a direction for future work.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127851807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep and shallow types for gradual languages","authors":"B. Greenman","doi":"10.1145/3519939.3523430","DOIUrl":"https://doi.org/10.1145/3519939.3523430","url":null,"abstract":"Sound gradual types come in many forms and offer varying levels of soundness. Two extremes are deep types and shallow types. Deep types offer compositional guarantees but depend on expensive higher-order contracts. Shallow types enforce only local properties, but can be implemented with first-order checks. This paper presents a language design that supports both deep and shallow types to utilize their complementary strengths. In the mixed language, deep types satisfy a strong complete monitoring guarantee and shallow types satisfy a first-order notion of type soundness. The design serves as the blueprint for an implementation in which programmers can easily switch between deep and shallow to leverage their distinct advantages. On the GTP benchmark suite, the median worst-case overhead drops from several orders of magnitude down to 3x relative to untyped. Where an exhaustive search is feasible, 40% of all configurations run fastest with a mix of deep and shallow types.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126152612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hamband: RDMA replicated data types","authors":"F. Houshmand, Javad Saberlatibari, M. Lesani","doi":"10.1145/3519939.3523426","DOIUrl":"https://doi.org/10.1145/3519939.3523426","url":null,"abstract":"Data centers are increasingly equipped with RDMAs. These network interfaces mark the advent of a new distributed system model where a node can directly access the remote memory of another. They have enabled microsecond-scale replicated services. The underlying replication protocols of these systems execute all operations under strong consistency. However, strong consistency can hinder response time and availability, and recent replication models have turned to a hybrid of strong and relaxed consistency. This paper presents RDMA well-coordinated replicated data types, the first hybrid replicated data types for the RDMA network model. It presents a novel operational semantics for these data types that considers three distinct categories of methods and captures their required coordination, and formally proves that they preserve convergence and integrity. It implements these semantics in a system called Hamband that leverages direct remote accesses to efficiently implement the required coordination protocols. The empirical evaluation shows that Hamband outperforms the throughput of existing message-based and strongly consistent implementations by more than 17x and 2.7x respectively.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131917455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Choosing mathematical function implementations for speed and accuracy","authors":"Ian Briggs, P. Panchekha","doi":"10.1145/3519939.3523452","DOIUrl":"https://doi.org/10.1145/3519939.3523452","url":null,"abstract":"Standard implementations of functions like sin and exp optimize for accuracy, not speed, because they are intended for general-purpose use. But just like many applications tolerate inaccuracy from cancellation, rounding error, and singularities, many application could also tolerate less-accurate function implementations. This raises an intriguing possibility: speeding up numerical code by using different function implementations. This paper thus introduces OpTuner, an automated tool for selecting the best implementation for each mathematical function call site. OpTuner uses error Taylor series and integer linear programming to compute optimal assignments of 297 function implementations to call sites and presents the user with a speed-accuracy Pareto curve. In a case study on the POV-Ray ray tracer, OpTuner speeds up a critical computation by 2.48x, leading to a whole program speedup of 1.09x with no change in the program output; human efforts result in slower code and lower-quality output. On a broader study of 36 standard benchmarks, OpTuner demonstrates speedups of 2.05x for negligible decreases in accuracy and of up to 5.37x for error-tolerant applications.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133618018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diaframe: automated verification of fine-grained concurrent programs in Iris","authors":"Ike Mulder, R. Krebbers, H. Geuvers","doi":"10.1145/3519939.3523432","DOIUrl":"https://doi.org/10.1145/3519939.3523432","url":null,"abstract":"Fine-grained concurrent programs are difficult to get right, yet play an important role in modern-day computers. We want to prove strong specifications of such programs, with minimal user effort, in a trustworthy way. In this paper, we present Diaframe—an automated and foundational verification tool for fine-grained concurrent programs. Diaframe is built on top of the Iris framework for higher-order concurrent separation logic in Coq, which already has a foundational soundness proof and the ability to give strong specifications, but lacks automation. Diaframe equips Iris with strong automation using a novel, extendable, goal-directed proof search strategy, using ideas from linear logic programming and bi-abduction. A benchmark of 24 examples from the literature shows that the proof burden of Diaframe is competitive with existing non-foundational tools, while its expressivity and soundness guarantees are stronger.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123165627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"All you need is superword-level parallelism: systematic control-flow vectorization with SLP","authors":"Yishen Chen, Charith Mendis, Saman P. Amarasinghe","doi":"10.1145/3519939.3523701","DOIUrl":"https://doi.org/10.1145/3519939.3523701","url":null,"abstract":"Superword-level parallelism (SLP) vectorization is a proven technique for vectorizing straight-line code. It works by replacing independent, isomorphic instructions with equivalent vector instructions. Larsen and Amarasinghe originally proposed using SLP vectorization (together with loop unrolling) as a simpler, more flexible alternative to traditional loop vectorization. However, this vision of replacing traditional loop vectorization has not been realized because SLP vectorization cannot directly reason with control flow. In this work, we introduce SuperVectorization, a new vectorization framework that generalizes SLP vectorization to uncover parallelism that spans different basic blocks and loop nests. With the capability to systematically vectorize instructions across control-flow regions such as basic blocks and loops, our framework simultaneously subsumes the roles of inner-loop, outer-loop, and straight-line vectorizer while retaining the flexibility of SLP vectorization (e.g., partial vectorization). Our evaluation shows that a single instance of our vectorizer is competitive with and, in many cases, significantly better than LLVM’s vectorization pipeline, which includes both loop and SLP vectorizers. For example, on an unoptimized, sequential volume renderer from Pharr and Mark, our vectorizer gains a 3.28× speedup, whereas none of the production compilers that we tested vectorizes to its complex control-flow constructs.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126450493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualization question answering using introspective program synthesis","authors":"Yanju Chen, Xifeng Yan, Yu Feng","doi":"10.1145/3519939.3523709","DOIUrl":"https://doi.org/10.1145/3519939.3523709","url":null,"abstract":"While data visualization plays a crucial role in gaining insights from data, generating answers over complex visualizations from natural language questions is far from an easy task. Mainstream approaches reduce data visualization queries to a semantic parsing problem, which either relies on expensive-to-annotate supervised training data that pairs natural language questions with logical forms, or weakly supervised models that incorporate a larger corpus but fail on long-tailed queries without explanations. This paper aims to answer data visualization queries by automatically synthesizing the corresponding program from natural language. At the core of our technique is an abstract synthesis engine that is bootstrapped by an off-the-shelf weakly supervised model and an optimal synthesis algorithm guided by triangle alignment constraints, which represent consistency among natural language, visualization, and the synthesized program. Starting with a few tentative answers obtained from an off-the-shelf statistical model, our approach first involves an abstract synthesizer that generates a set of sketches that are consistent with the answers. Then we design an instance of optimal synthesis to complete one of the candidate sketches by satisfying common type constraints and maximizing the consistency among three parties, i.e., natural language, the visualization, and the candidate program. We implement the proposed idea in a system called Poe that can answer visualization queries from natural language. Our method is fully automated and does not require users to know the underlying schema of the visualizations. We evaluate Poe on 629 visualization queries and our experiment shows that Poe outperforms state-of-the-arts by improving the accuracy from 44% to 59%.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125304964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RustHornBelt: a semantic foundation for functional verification of Rust programs with unsafe code","authors":"Yusuke Matsushita, Xavier Denis, Jacques-Henri Jourdan, Derek Dreyer","doi":"10.1145/3519939.3523704","DOIUrl":"https://doi.org/10.1145/3519939.3523704","url":null,"abstract":"Rust is a systems programming language that offers both low-level memory operations and high-level safety guarantees, via a strong ownership type system that prohibits mutation of aliased state. In prior work, Matsushita et al. developed RustHorn, a promising technique for functional verification of Rust code: it leverages the strong invariants of Rust types to express the behavior of stateful Rust code with first-order logic (FOL) formulas, whose verification is amenable to off-the-shelf automated techniques. RustHorn’s key idea is to use prophecies to describe the behavior of mutable borrows. However, the soundness of RustHorn was only established for a safe subset of Rust, and it has remained unclear how to extend it to support various safe APIs that encapsulate unsafe code (i.e., code where Rust’s aliasing discipline is relaxed). In this paper, we present RustHornBelt, the first machine-checked proof of soundness for RustHorn-style verification which supports giving FOL specs to safe APIs implemented with unsafe code. RustHornBelt employs the approach of semantic typing used in Jung et al.’s RustBelt framework, but it extends RustBelt’s model to reason not only about safety but also functional correctness. The key challenge in RustHornBelt is to develop a semantic model of RustHorn-style prophecies, which we achieve via a new separation-logic mechanism we call parametric prophecies.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"3 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116675756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Sammler, Angus Hammond, Rodolphe Lepigre, B. Campbell, Jean Pichon-Pharabod, Derek Dreyer, Deepak Garg, Peter Sewell
{"title":"Islaris: verification of machine code against authoritative ISA semantics","authors":"Michael Sammler, Angus Hammond, Rodolphe Lepigre, B. Campbell, Jean Pichon-Pharabod, Derek Dreyer, Deepak Garg, Peter Sewell","doi":"10.1145/3519939.3523434","DOIUrl":"https://doi.org/10.1145/3519939.3523434","url":null,"abstract":"Recent years have seen great advances towards verifying large-scale systems code. However, these verifications are usually based on hand-written assembly or machine-code semantics for the underlying architecture that only cover a small part of the instruction set architecture (ISA). In contrast, other recent work has used Sail to establish formal models for large real-world architectures, including Armv8-A and RISC-V, that are comprehensive (complete enough to boot an operating system or hypervisor) and authoritative (automatically derived from the Arm internal model and validated against the Arm validation suite, and adopted as the official formal specification by RISC-V International, respectively). But the scale and complexity of these models makes them challenging to use as a basis for verification. In this paper, we propose Islaris, the first system to support verification of machine code above these complete and authoritative real-world ISA specifications. Islaris uses a novel combination of SMT-solver-based symbolic execution (the Isla symbolic executor) and automated reasoning in a foundational program logic (a new separation logic we derive using Iris in Coq). We show that this approach can handle Armv8-A and RISC-V machine code exercising a wide range of systems features, including installing and calling exception vectors, code parametric on a relocation address offset (from the production pKVM hypervisor); unaligned access faults; memory-mapped IO; and compiled C code using inline assembly and function pointers.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133164933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Verifying optimizations of concurrent programs in the promising semantics","authors":"Junpeng Zha, Hongjin Liang, Xinyu Feng","doi":"10.1145/3519939.3523734","DOIUrl":"https://doi.org/10.1145/3519939.3523734","url":null,"abstract":"Weak memory models for concurrent programming languages are expected to admit standard compiler optimizations. However, prior works on verifying optimizations in weak memory models are mostly focused on simple optimizations on small code snippets which satisfy certain syntactic requirements. It receives less attention whether weak memory models can admit real-world optimization algorithms based on program analyses. In this paper, we develop the first simulation technique for verifying thread-local analyses-based optimizations in the promising semantics PS2.1, which is a weak memory model recently proposed for C/C++11 concurrency. Our simulation is based on a novel non-preemptive semantics, which is equivalent to the original PS2.1 but has less non-determinism. We apply our simulation to verify four optimizations in PS2.1: constant propagation, dead code elimination, common subexpression elimination and loop invariant code motion.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133608665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}