Probing Numeracy and Logic of Language Models of Code

2023 IEEE/ACM International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE) Pub Date : 2023-05-01 DOI:10.1109/InteNSE59150.2023.00006

Razan Baltaji, Parth Thakkar

{"title":"Probing Numeracy and Logic of Language Models of Code","authors":"Razan Baltaji, Parth Thakkar","doi":"10.1109/InteNSE59150.2023.00006","DOIUrl":null,"url":null,"abstract":"Machine learning techniques have found a widespread use in the software engineering community. In particular, language models (LMs) trained on code form the backbone of a majority of these applications, spanning tasks such as code completion, summarization, refactoring, execution prediction, and test generation. These tasks require reasoning about both the syntax and semantics of code. Recent work has shown that language models learn to capture the syntactic properties of code, but it is unclear to what extent they can reason about the semantics of code. In this work, we explore the ability of 3 language models of code to reason about a specific kind of semantics: numerical and logical properties of code. We propose several probing tasks to test the numerical and logical reasoning abilities of these models. We find that the models we explore - CodeBERT, GraphCodeBERT and CodeGen do indeed learn many numerical and logical properties of code, such as finding maximum in a list of numbers, comparing numbers, evaluating boolean expressions and representing numbers. They do not perform as well on complex tasks such as evaluating arithmetic expressions and substituting variables in such expressions. Our results indicate that while these models hold promise, there is a lot of room for improvement of their numeric and logical reasoning abilities.","PeriodicalId":166762,"journal":{"name":"2023 IEEE/ACM International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/InteNSE59150.2023.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning techniques have found a widespread use in the software engineering community. In particular, language models (LMs) trained on code form the backbone of a majority of these applications, spanning tasks such as code completion, summarization, refactoring, execution prediction, and test generation. These tasks require reasoning about both the syntax and semantics of code. Recent work has shown that language models learn to capture the syntactic properties of code, but it is unclear to what extent they can reason about the semantics of code. In this work, we explore the ability of 3 language models of code to reason about a specific kind of semantics: numerical and logical properties of code. We propose several probing tasks to test the numerical and logical reasoning abilities of these models. We find that the models we explore - CodeBERT, GraphCodeBERT and CodeGen do indeed learn many numerical and logical properties of code, such as finding maximum in a list of numbers, comparing numbers, evaluating boolean expressions and representing numbers. They do not perform as well on complex tasks such as evaluating arithmetic expressions and substituting variables in such expressions. Our results indicate that while these models hold promise, there is a lot of room for improvement of their numeric and logical reasoning abilities.

查看原文本刊更多论文

探讨代码语言模型的算术性和逻辑性

机器学习技术在软件工程界得到了广泛的应用。特别是，在代码上训练的语言模型(LMs)构成了大多数这些应用程序的主干，涵盖了代码完成、总结、重构、执行预测和测试生成等任务。这些任务需要对代码的语法和语义进行推理。最近的研究表明，语言模型可以学习捕捉代码的语法属性，但还不清楚它们能在多大程度上推断代码的语义。在这项工作中，我们探索了代码的三种语言模型对特定语义的推理能力:代码的数值和逻辑属性。我们提出了几个探索性任务来测试这些模型的数值和逻辑推理能力。我们发现我们探索的模型——CodeBERT、GraphCodeBERT和CodeGen确实学习了代码的许多数值和逻辑属性，例如在数字列表中查找最大值、比较数字、计算布尔表达式和表示数字。它们在计算算术表达式和替换表达式中的变量等复杂任务上表现不佳。我们的结果表明，虽然这些模型有希望，但它们的数字和逻辑推理能力还有很大的改进空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/ACM International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE)

自引率

0.00%

发文量