Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Ziyuan Nan, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen
{"title":"Empowering LLMs for Verilog Generation through Multi-Level Summarization","authors":"Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Ziyuan Nan, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen","doi":"arxiv-2407.10424","DOIUrl":"https://doi.org/arxiv-2407.10424","url":null,"abstract":"The increasing complexity and high costs associated with modern processor\u0000design have led to a surge in demand for processor design automation.\u0000Instruction-tuned large language models (LLMs) have demonstrated remarkable\u0000performance in automatically generating code for general-purpose programming\u0000languages like Python. However, these methods fail on hardware description\u0000languages (HDLs) like Verilog due to the scarcity of high-quality instruction\u0000tuning data, as even advanced LLMs like GPT-3.5 exhibit limited performance on\u0000Verilog generation. Regarding this issue, we observe that (1) Verilog code\u0000collected from the real world has higher quality than those generated by LLMs.\u0000(2) LLMs like GPT-3.5 excel in summarizing Verilog code rather than generating\u0000it. Based on these observations, this paper introduces CodeV, a series of\u0000open-source instruction-tuned Verilog generation LLMs. Instead of generating\u0000descriptions first and then getting the corresponding code from advanced LLMs,\u0000we prompt the LLM with Verilog code and let the LLM generate the corresponding\u0000natural language description by multi-level summarization. Experimental results\u0000show that CodeV relatively surpasses the previous open-source SOTA by 14.4%\u0000(BetterV in VerilogEval) and 11.3% (RTLCoder in RTLLM) respectively, and also\u0000relatively outperforms previous commercial SOTA GPT-4 by 22.1% in VerilogEval.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141718893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marwa Naïr, Kamel Yamani, Lynda Said Lhadj, Riyadh Baghdadi
{"title":"Curriculum Learning for Small Code Language Models","authors":"Marwa Naïr, Kamel Yamani, Lynda Said Lhadj, Riyadh Baghdadi","doi":"arxiv-2407.10194","DOIUrl":"https://doi.org/arxiv-2407.10194","url":null,"abstract":"Code language models have emerged as useful tools for various programming\u0000tasks, yet they often struggle when it comes to complex ones. In this paper, we\u0000explore the potential of curriculum learning in enhancing the performance of\u0000these models. While prior research has suggested that curriculum learning does\u0000not necessarily help in improving the performance of language models, our\u0000results surprisingly show that this may not be the case for code language\u0000models. We demonstrate that a well-designed curriculum learning approach\u0000significantly improves the accuracy of small decoder-only code language models\u0000on the task of code execution, while its effect on code completion is less\u0000significant. To explore the potential of curriculum learning, we train multiple\u0000GPT models with 1 million parameters each to predict the next token and\u0000evaluate them on code completion and execution tasks. Our contributions include\u0000proposing a novel code difficulty assessment metric by combining software code\u0000measures, investigating the effectiveness of Curriculum Learning for code\u0000language models, and introducing a Novel Curriculum Learning schedule that\u0000enhances the performance of small decoder-only language models in code\u0000execution tasks. The results of this paper open the door for more research on\u0000the use of curriculum learning for code language models.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141718892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defining Name Accessibility using Scope Graphs (Extended Edition)","authors":"Aron Zwaan, Casper Bach Poulsen","doi":"arxiv-2407.09320","DOIUrl":"https://doi.org/arxiv-2407.09320","url":null,"abstract":"Many programming languages allow programmers to regulate accessibility; i.e.,\u0000annotating a declaration with keywords such as export and private to indicate\u0000where it can be accessed. Despite the importance of name accessibility for,\u0000e.g., compilers, editor auto-completion and tooling, and automated\u0000refactorings, few existing type systems provide a formal account of name\u0000accessibility. We present a declarative, executable, and language-parametric model for name\u0000accessibility, which provides a formal specification of name accessibility in\u0000Java, C#, C++, Rust, and Eiffel. We achieve this by defining name accessibility\u0000as a predicate on resolution paths through scope graphs. Since scope graphs are\u0000a language-independent model of name resolution, our model provides a uniform\u0000approach to defining different accessibility policies for different languages. Our model is implemented in Statix, a logic language for executable type\u0000system specification using scope graphs. We evaluate its correctness on a test\u0000suite that compares it with the C#, Java, and Rust compilers, and show we can\u0000synthesize access modifiers in programs with holes accurately.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141722173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Refinements for Multiparty Message-Passing Protocols: Specification-agnostic theory and implementation","authors":"Vassor Martin, Yoshida Nobuko","doi":"arxiv-2407.09106","DOIUrl":"https://doi.org/arxiv-2407.09106","url":null,"abstract":"Multiparty message-passing protocols are notoriously difficult to design, due\u0000to interaction mismatches that lead to errors such as deadlocks. Existing\u0000protocol specification formats have been developed to prevent such errors (e.g.\u0000multiparty session types (MPST)). In order to further constrain protocols,\u0000specifications can be extended with refinements, i.e. logical predicates to\u0000control the behaviour of the protocol based on previous values exchanged.\u0000Unfortunately, existing refinement theories and implementations are tightly\u0000coupled with specification formats. This paper proposes a framework for\u0000multiparty message-passing protocols with refinements and its implementation in\u0000Rust. Our work decouples correctness of refinements from the underlying model\u0000of computation, which results in a specification-agnostic framework. Our\u0000contributions are threefold. First, we introduce a trace system which\u0000characterises valid refined traces, i.e. a sequence of sending and receiving\u0000actions correct with respect to refinements. Second, we give a correct model of\u0000computation named refined communicating system (RCS), which is an extension of\u0000communicating automata systems with refinements. We prove that RCS only produce\u0000valid refined traces. We show how to generate RCS from mainstream protocol\u0000specification formats, such as refined multiparty session types (RMPST) or\u0000refined choreography automata. Third, we illustrate the flexibility of the\u0000framework by developing both a static analysis technique and an improved model\u0000of computation for dynamic refinement evaluation. Finally, we provide a Rust\u0000toolchain for decentralised RMPST, evaluate our implementation with a set of\u0000benchmarks from the literature, and observe that refinement overhead is\u0000negligible.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141718894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Higher-Order Specificationsfor Deductive Synthesis of Programs with Pointers (Extended version)","authors":"David Young, Ziyi Yang, Ilya Sergey, Alex Potanin","doi":"arxiv-2407.09143","DOIUrl":"https://doi.org/arxiv-2407.09143","url":null,"abstract":"Synthetic Separation Logic (SSL) is a formalism that powers SuSLik, the\u0000state-of-the-art approach for the deductive synthesis of provably-correct\u0000programs in C-like languages that manipulate Heap-based linked data structures.\u0000Despite its expressivity, SSL suffers from two shortcomings that hinder its\u0000utility. First, its main specification component, inductive predicates, only\u0000admits emph{first-order} definitions of data structure shapes, which leads to\u0000the proliferation of ``boiler-plate'' predicates for specifying common\u0000patterns. Second, SSL requires emph{concrete} definitions of data structures\u0000to synthesise programs that manipulate them, which results in the need to\u0000change a specification for a synthesis task every time changes are introduced\u0000into the layout of the involved structures. We propose to significantly lift the level of abstraction used in writing\u0000Separation Logic specifications for synthesis -- both simplifying the approach\u0000and making the specifications more usable and easy to read and follow. We avoid\u0000the need to repetitively re-state low-level representation details throughout\u0000the specifications -- allowing the reuse of different implementations of the\u0000same data structure by abstracting away the details of a specific layout used\u0000in memory. Our novel textit{high-level front-end language} called Pika\u0000significantly improves the expressiveness of SuSLik. We implemented a layout-agnostic synthesiser from Pika to SuSLik enabling\u0000push-button synthesis of C programs with in-place memory updates, along with\u0000the accompanying full proofs that they meet Separation Logic-style\u0000specifications, from high-level specifications that resemble ordinary\u0000functional programs. Our experiments show that our tool can produce C code that\u0000is comparable in its performance characteristics and is sometimes faster than\u0000Haskell.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141722174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pragmatics of Formally Verified Yet Efficient Static Analysis, in particular for Formally Verified Compilers","authors":"David MonniauxVERIMAG - IMAG","doi":"arxiv-2407.08258","DOIUrl":"https://doi.org/arxiv-2407.08258","url":null,"abstract":"Formally verified compilers and formally verified static analyzers are a\u0000solution to the problem that certain industries face when they have to\u0000demonstrate to authorities that the object code they run truly corresponds to\u0000its source code and that it satisfies certain properties. From a scientific and\u0000technological point of view, they are a challenge: not only a number of\u0000nontrivial invariants and algorithms must be proved to be correct, but also the\u0000implementation must be reasonably effective so that the tools operate within\u0000reasonable time. Many optimizations in compilers rely on static analysis, and\u0000thus a formally verified compiler entails formally verified static analyses.In\u0000this article, we explain some difficulties, possible solutions, design choices\u0000and trade-offs pertaining to verified static analysis, in particular when the\u0000solution of the analysis is expressed as some form of tree, map or set.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Functional Programming in Learning Electromagnetic Theory","authors":"Scott N. WalckLebanon Valley College","doi":"arxiv-2407.08090","DOIUrl":"https://doi.org/arxiv-2407.08090","url":null,"abstract":"Electromagnetic theory is central to physics. An undergraduate major in\u0000physics typically takes a semester or a year of electromagnetic theory as a\u0000junior or senior, and a graduate student in physics typically takes an\u0000additional semester or year at a more advanced level. In fall 2023, the author\u0000taught his undergraduate electricity and magnetism class using numerical\u0000methods in Haskell in parallel with traditional analytical methods. This\u0000article describes what functional programming has to offer to physics in\u0000general, and electromagnetic theory in particular. We give examples from vector\u0000calculus, the mathematical language in which electromagnetic theory is\u0000expressed, and electromagnetic theory itself.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mover Logic: A Concurrent Program Logic for Reduction and Rely-Guarantee Reasoning (Extended Version)","authors":"Cormac Flanagan, Stephen N. Freund","doi":"arxiv-2407.08070","DOIUrl":"https://doi.org/arxiv-2407.08070","url":null,"abstract":"Rely-guarantee (RG) logic uses thread interference specifications (relies and\u0000guarantees) to reason about the correctness of multithreaded software.\u0000Unfortunately, RG logic requires each function postcondition to be \"stabilized\"\u0000or specialized to the behavior of other threads, making it difficult to write\u0000function specifications that are reusable at multiple call sites. This paper presents mover logic, which extends RG logic to address this\u0000problem via the notion of atomic functions. Atomic functions behave as if they\u0000execute serially without interference from concurrent threads, and so they can\u0000be assigned more general and reusable specifications that avoid the\u0000stabilization requirement of RG logic. Several practical verifiers (Calvin-R,\u0000QED, CIVL, Armada, Anchor, etc.) have demonstrated the modularity benefits of\u0000atomic function specifications. However, the complexity of these systems and\u0000their correctness proofs makes it challenging to understand and extend these\u0000systems. Mover logic formalizes the central ideas reduction in a declarative\u0000program logic that may provide foundation for future work in this area.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Systematic Mapping Study on Teaching of Security Concepts in Programming Courses","authors":"Alina Torbunova, Adnan Ashraf, Ivan Porres","doi":"arxiv-2407.07511","DOIUrl":"https://doi.org/arxiv-2407.07511","url":null,"abstract":"Context: To effectively defend against ever-evolving cybersecurity threats,\u0000software systems should be made as secure as possible. To achieve this,\u0000software developers should understand potential vulnerabilities and apply\u0000secure coding practices. To prepare these skilled professionals, it is\u0000important that cybersecurity concepts are included in programming courses\u0000taught at universities. Objective: To present a comprehensive and unbiased\u0000literature review on teaching of cybersecurity concepts in programming courses\u0000taught at universities. Method: We perform a Systematic Mapping Study. We\u0000present six research questions, define our selection criteria, and develop a\u0000classification scheme. Results and Conclusions: We select 24 publications. Our\u0000results show a wide range of research contributions. We also outline guidelines\u0000and identify opportunities for future studies. The guidelines include coverage\u0000of security knowledge categories and evaluation of contributions. We suggest\u0000that future studies should cover security issues, negative impacts, and\u0000countermeasures, as well as apply evaluation techniques that examine students'\u0000knowledge. The opportunities for future studies are related to advanced\u0000courses, security knowledge frameworks, and programming environments.\u0000Furthermore, there is a need of a holistic security framework that covers the\u0000security concepts identified in this study and is suitable for education.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Programming Language Case Studies Can Be Deep","authors":"Rose BohrerWorcester Polytechnic Institute","doi":"arxiv-2407.08091","DOIUrl":"https://doi.org/arxiv-2407.08091","url":null,"abstract":"In the pedagogy of programming languages, one well-known course structure is\u0000to tour multiple languages as a means of touring paradigms. This\u0000tour-of-paradigms approach has long received criticism as lacking depth,\u0000distracting students from foundational issues in language theory and\u0000implementation. This paper argues for disentangling the idea of a\u0000tour-of-languages from the tour-of-paradigms. We make this argument by\u0000presenting, in depth, a series of case studies included in the Human-Centered\u0000Programming Languages curriculum. In this curriculum, case studies become deep,\u0000serving to tour the different intellectual foundations through which a scholar\u0000can approach programming languages, which one could call the tour-of-humans. In\u0000particular, the design aspect of programming languages has much to learn from\u0000the social sciences and humanities, yet these intellectual foundations would\u0000yield far fewer deep contributions if we did not permit them to employ case\u0000studies.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}