{"title":"Guard Analysis and Safe Erasure Gradual Typing: a Type System for Elixir","authors":"Giuseppe Castagna, Guillaume Duboc","doi":"arxiv-2408.14345","DOIUrl":"https://doi.org/arxiv-2408.14345","url":null,"abstract":"We define several techniques to extend gradual typing with semantic\u0000subtyping, specifically targeting dynamic languages. Focusing on the Elixir\u0000programming language, we provide the theoretical foundations for its type\u0000system. Our approach demonstrates how to achieve type soundness for gradual\u0000typing in existing dynamic languages without modifying their compilation, while\u0000still maintaining high precision. This is accomplished through the static\u0000detection of \"strong functions\", which leverage runtime checks inserted by the\u0000programmer or performed by the virtual machine, and through a fine-grained type\u0000analysis of pattern-matching expressions with guards.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Callista Le, Kiran Gopinathan, Koon Wen Lee, Seth Gilbert, Ilya Sergey
{"title":"Concurrent Data Structures Made Easy (Extended Version)","authors":"Callista Le, Kiran Gopinathan, Koon Wen Lee, Seth Gilbert, Ilya Sergey","doi":"arxiv-2408.13779","DOIUrl":"https://doi.org/arxiv-2408.13779","url":null,"abstract":"Design of an efficient thread-safe concurrent data structure is a balancing\u0000act between its implementation complexity and performance. Lock-based\u0000concurrent data structures, which are relatively easy to derive from their\u0000sequential counterparts and to prove thread-safe, suffer from poor throughput\u0000under even light multi-threaded workload. At the same time, lock-free\u0000concurrent structures allow for high throughput, but are notoriously difficult\u0000to get right and require careful reasoning to formally establish their\u0000correctness. We explore a solution to this conundrum based on batch parallelism, an\u0000approach for designing concurrent data structures via a simple insight:\u0000efficiently processing a batch of a priori known operations in parallel is\u0000easier than optimising performance for a stream of arbitrary asynchronous\u0000requests. Alas, batch-parallel structures have not seen wide practical adoption\u0000due to (i) the inconvenience of having to structure multi-threaded programs to\u0000explicitly group operations and (ii) the lack of a systematic methodology to\u0000implement batch-parallel structures as simply as lock-based ones. We present OBatcher-an OCaml library that streamlines the design,\u0000implementation, and usage of batch-parallel structures. It solves the first\u0000challenge (how to use) by suggesting a new lightweight implicit batching design\u0000that is built on top of generic asynchronous programming mechanisms. The second\u0000challenge (how to implement) is addressed by identifying a family of strategies\u0000for converting common sequential structures into efficient batch-parallel ones.\u0000We showcase OBatcher with a diverse set of benchmarks. Our evaluation of all\u0000the implementations on large asynchronous workloads shows that (a) they\u0000consistently outperform the corresponding coarse-grained lock-based\u0000implementations and that (b) their throughput scales reasonably with the number\u0000of processors.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haau-Sing Li, Patrick Fernandes, Iryna Gurevych, André F. T. Martins
{"title":"DOCE: Finding the Sweet Spot for Execution-Based Code Generation","authors":"Haau-Sing Li, Patrick Fernandes, Iryna Gurevych, André F. T. Martins","doi":"arxiv-2408.13745","DOIUrl":"https://doi.org/arxiv-2408.13745","url":null,"abstract":"Recently, a diverse set of decoding and reranking procedures have been shown\u0000effective for LLM-based code generation. However, a comprehensive framework\u0000that links and experimentally compares these methods is missing. We address\u0000this by proposing Decoding Objectives for Code Execution, a comprehensive\u0000framework that includes candidate generation, $n$-best reranking, minimum Bayes\u0000risk (MBR) decoding, and self-debugging as the core components. We then study\u0000the contributions of these components through execution-based evaluation\u0000metrics. Our findings highlight the importance of execution-based methods and\u0000the difference gap between execution-based and execution-free methods.\u0000Furthermore, we assess the impact of filtering based on trial unit tests, a\u0000simple and effective strategy that has been often overlooked in prior works. We\u0000also propose self-debugging on multiple candidates, obtaining state-of-the-art\u0000performance on reranking for code generation. We expect our framework to\u0000provide a solid guideline for future research on code generation.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Which Part of the Heap is Useful? Improving Heap Liveness Analysis","authors":"Vini Kanvar, Uday P. Khedker","doi":"arxiv-2408.12947","DOIUrl":"https://doi.org/arxiv-2408.12947","url":null,"abstract":"With the growing sizes of data structures allocated in heap, understanding\u0000the actual use of heap memory is critically important for minimizing cache\u0000misses and reclaiming unused memory. A static analysis aimed at this is\u0000difficult because the heap locations are unnamed. Using allocation sites to\u0000name them creates very few distinctions making it difficult to identify\u0000allocated heap locations that are not used. Heap liveness analysis using access\u0000graphs solves this problem by (a) using a storeless model of heap memory by\u0000naming the locations with access paths, and (b) representing the unbounded sets\u0000of access paths (which are regular languages) as finite automata. We improve the scalability and efficiency of heap liveness analysis, and\u0000reduce the amount of computed heap liveness information by using deterministic\u0000automata and by minimizing the inclusion of aliased access paths in the\u0000language. Practically, our field-, flow-, context-sensitive liveness analysis\u0000on SPEC CPU2006 benchmarks scales to 36 kLoC (existing analysis scales to 10.5\u0000kLoC) and improves efficiency even up to 99%. For some of the benchmarks, our\u0000technique shows multifold reduction in the computed liveness information,\u0000ranging from 2 to 100 times (in terms of the number of live access paths),\u0000without compromising on soundness.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LOUD: Synthesizing Strongest and Weakest Specifications","authors":"Kanghee Park, Xuanyu Peng, Loris D'Antoni","doi":"arxiv-2408.12539","DOIUrl":"https://doi.org/arxiv-2408.12539","url":null,"abstract":"Specifications allow us to formally state and understand what programs are\u0000intended to do. To help one extract useful properties from code, Park et al.\u0000recently proposed a framework that given (i) a quantifier-free query posed\u0000about a set of function definitions, and (ii) a domain-specific language L in\u0000which each extracted property is to be expressed (we call properties in the\u0000language L-properties), synthesizes a set of L-properties such that each of the\u0000property is a strongest L-consequence for the query: the property is an\u0000over-approximation of query and there is no other L-property that\u0000over-approximates query and is strictly more precise than each property. The framework by Park et al. has two key limitations. First, it only supports\u0000quantifier-free query formulas and thus cannot synthesize specifications for\u0000queries involving nondeterminism, concurrency, etc. Second, it can only compute\u0000L-consequences, i.e., over-approximations of the program behavior. This paper addresses these two limitations and presents a framework, Loud,\u0000for synthesizing strongest L-consequences and weakest L-implicants (i.e.,\u0000under-approximations of the query) for function definitions that can involve\u0000existential quantifiers. We implemented a solver, Aspire, for problems expressed in Loud which can be\u0000used to describe and identify sources of bugs in both deterministic and\u0000nondeterministic programs, extract properties from concurrent programs, and\u0000synthesize winning strategies in two-player games.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zachariah Sollenberger, Jay Patel, Christian Munley, Aaron Jarmusch, Sunita Chandrasekaran
{"title":"LLM4VV: Exploring LLM-as-a-Judge for Validation and Verification Testsuites","authors":"Zachariah Sollenberger, Jay Patel, Christian Munley, Aaron Jarmusch, Sunita Chandrasekaran","doi":"arxiv-2408.11729","DOIUrl":"https://doi.org/arxiv-2408.11729","url":null,"abstract":"Large Language Models (LLM) are evolving and have significantly\u0000revolutionized the landscape of software development. If used well, they can\u0000significantly accelerate the software development cycle. At the same time, the\u0000community is very cautious of the models being trained on biased or sensitive\u0000data, which can lead to biased outputs along with the inadvertent release of\u0000confidential information. Additionally, the carbon footprints and the\u0000un-explainability of these black box models continue to raise questions about\u0000the usability of LLMs. With the abundance of opportunities LLMs have to offer, this paper explores\u0000the idea of judging tests used to evaluate compiler implementations of\u0000directive-based programming models as well as probe into the black box of LLMs.\u0000Based on our results, utilizing an agent-based prompting approach and setting\u0000up a validation pipeline structure drastically increased the quality of\u0000DeepSeek Coder, the LLM chosen for the evaluation purposes.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ellie Y. Cheng, Eric Atkinson, Guillaume Baudart, Louis Mandel, Michael Carbin
{"title":"Inference Plans for Hybrid Particle Filtering","authors":"Ellie Y. Cheng, Eric Atkinson, Guillaume Baudart, Louis Mandel, Michael Carbin","doi":"arxiv-2408.11283","DOIUrl":"https://doi.org/arxiv-2408.11283","url":null,"abstract":"Advanced probabilistic programming languages (PPLs) use hybrid inference\u0000systems to combine symbolic exact inference and Monte Carlo methods to improve\u0000inference performance. These systems use heuristics to partition random\u0000variables within the program into variables that are encoded symbolically and\u0000variables that are encoded with sampled values, and the heuristics are not\u0000necessarily aligned with the performance evaluation metrics used by the\u0000developer. In this work, we present inference plans, a programming interface\u0000that enables developers to control the partitioning of random variables during\u0000hybrid particle filtering. We further present Siren, a new PPL that enables\u0000developers to use annotations to specify inference plans the inference system\u0000must implement. To assist developers with statically reasoning about whether an\u0000inference plan can be implemented, we present an abstract-interpretation-based\u0000static analysis for Siren for determining inference plan satisfiability. We\u0000prove the analysis is sound with respect to Siren's semantics. Our evaluation\u0000applies inference plans to three different hybrid particle filtering algorithms\u0000on a suite of benchmarks and shows that the control provided by inference plans\u0000enables speed ups of 1.76x on average and up to 206x to reach target accuracy,\u0000compared to the inference plans implemented by default heuristics; the results\u0000also show that inference plans improve accuracy by 1.83x on average and up to\u0000595x with less or equal runtime, compared to the default inference plans. We\u0000further show that the static analysis is precise in practice, identifying all\u0000satisfiable inference plans in 27 out of the 33 benchmark-algorithm\u0000combinations.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A type system for data flow and alias analysis in ReScript","authors":"Nicky Ask Lund, Hans Hüttel","doi":"arxiv-2408.11954","DOIUrl":"https://doi.org/arxiv-2408.11954","url":null,"abstract":"ReScript introduces a strongly typed language that targets JavaScript, as an\u0000alternative to gradually typed languages, such as TypeScript. In this paper, we\u0000present a type system for data-flow analysis for a subset of the ReScript\u0000language, more specific for a lambda-calculus with mutability and pattern\u0000matching. The type system is a local analysis that collects information about\u0000what variables are used and alias information.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aaron Jarmusch, Felipe Cabarcas, Swaroop Pophale, Andrew Kallai, Johannes Doerfert, Luke Peyralans, Seyong Lee, Joel Denny, Sunita Chandrasekaran
{"title":"CI/CD Efforts for Validation, Verification and Benchmarking OpenMP Implementations","authors":"Aaron Jarmusch, Felipe Cabarcas, Swaroop Pophale, Andrew Kallai, Johannes Doerfert, Luke Peyralans, Seyong Lee, Joel Denny, Sunita Chandrasekaran","doi":"arxiv-2408.11777","DOIUrl":"https://doi.org/arxiv-2408.11777","url":null,"abstract":"Software developers must adapt to keep up with the changing capabilities of\u0000platforms so that they can utilize the power of High- Performance Computers\u0000(HPC), including exascale systems. OpenMP, a directive-based parallel\u0000programming model, allows developers to include directives to existing C, C++,\u0000or Fortran code to allow node level parallelism without compromising\u0000performance. This paper describes our CI/CD efforts to provide easy evaluation\u0000of the support of OpenMP across different compilers using existing testsuites\u0000and benchmark suites on HPC platforms. Our main contributions include (1) the\u0000set of a Continuous Integration (CI) and Continuous Development (CD) workflow\u0000that captures bugs and provides faster feedback to compiler developers, (2) an\u0000evaluation of OpenMP (offloading) implementations supported by AMD, HPE, GNU,\u0000LLVM, and Intel, and (3) evaluation of the quality of compilers across\u0000different heterogeneous HPC platforms. With the comprehensive testing through\u0000the CI/CD workflow, we aim to provide a comprehensive understanding of the\u0000current state of OpenMP (offloading) support in different compilers and\u0000heterogeneous platforms consisting of CPUs and GPUs from NVIDIA, AMD, and\u0000Intel.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Fink, Dimitrios Stavrakakis, Dennis Sprokholt, Soham Chakraborty, Jan-Erik Ekberg, Pramod Bhatotia
{"title":"Cage: Hardware-Accelerated Safe WebAssembly","authors":"Martin Fink, Dimitrios Stavrakakis, Dennis Sprokholt, Soham Chakraborty, Jan-Erik Ekberg, Pramod Bhatotia","doi":"arxiv-2408.11456","DOIUrl":"https://doi.org/arxiv-2408.11456","url":null,"abstract":"WebAssembly (WASM) is an immensely versatile and increasingly popular\u0000compilation target. It executes applications written in several languages\u0000(e.g., C/C++) with near-native performance in various domains (e.g., mobile,\u0000edge, cloud). Despite WASM's sandboxing feature, which isolates applications\u0000from other instances and the host platform, WASM does not inherently provide\u0000any memory safety guarantees for applications written in low-level, unsafe\u0000languages. To this end, we propose Cage, a hardware-accelerated toolchain for WASM that\u0000supports unmodified applications compiled to WASM and utilizes diverse Arm\u0000hardware features aiming to enrich the memory safety properties of WASM.\u0000Precisely, Cage leverages Arm's Memory Tagging Extension (MTE) to (i)~provide\u0000spatial and temporal memory safety for heap and stack allocations and\u0000(ii)~improve the performance of WASM's sandboxing mechanism. Cage further\u0000employs Arm's Pointer Authentication (PAC) to prevent leaked pointers from\u0000being reused by other WASM instances, thus enhancing WASM's security\u0000properties. We implement our system based on 64-bit WASM. We provide a WASM compiler and\u0000runtime with support for Arm's MTE and PAC. On top of that, Cage's LLVM-based\u0000compiler toolchain transforms unmodified applications to provide spatial and\u0000temporal memory safety for stack and heap allocations and prevent function\u0000pointer reuse. Our evaluation on real hardware shows that Cage incurs minimal\u0000runtime ($<5.8,%$) and memory ($<3.7,%$) overheads and can improve the\u0000performance of WASM's sandboxing mechanism, achieving a speedup of over\u0000$5.1,%$, while offering efficient memory safety guarantees.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}