Assessing Generalizability of CodeBERT

2021 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2021-09-01 DOI:10.26226/morressier.613b5418842293c031b5b630

Xin Zhou, Donggyun Han, David Lo

{"title":"Assessing Generalizability of CodeBERT","authors":"Xin Zhou, Donggyun Han, David Lo","doi":"10.26226/morressier.613b5418842293c031b5b630","DOIUrl":null,"url":null,"abstract":"Pre-trained models like BERT have achieved strong improvements on many natural language processing (NLP) tasks, showing their great generalizability. The success of pre-trained models in NLP inspires pre-trained models for programming language. Recently, CodeBERT, a model for both natural language (NL) and programming language (PL), pre-trained on code search dataset, is proposed. Although promising, CodeBERT has not been evaluated beyond its pre-trained dataset for NL-PL tasks. Also, it has only been shown effective on two tasks that are close in nature to its pre-trained data. This raises two questions: Can CodeBERT generalize beyond its pre-trained data? Can it generalize to various software engineering tasks involving NL and PL? Our work answers these questions by performing an empirical investigation into the generalizability of CodeBERT. First, we assess the generalizability of CodeBERT to datasets other than its pre-training data. Specifically, considering the code search task, we conduct experiments on another dataset containing Python code snippets and their corresponding documentation. We also consider yet another dataset of questions and answers collected from Stack Overflow about Python programming. Second, to assess the generalizability of CodeBERT to various software engineering tasks, we apply CodeBERT to the just-in-time defect prediction task. Our empirical results support the generalizability of CodeBERT on the additional data and task. CodeBERT-based solutions can achieve higher or comparable performance than specialized solutions designed for the code search and just-in-time defect prediction tasks. However, the superior performance of the CodeBERT requires a tradeoff; for example, it requires much more computation resources as compared to specialized code search approaches.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26226/morressier.613b5418842293c031b5b630","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

Abstract

Pre-trained models like BERT have achieved strong improvements on many natural language processing (NLP) tasks, showing their great generalizability. The success of pre-trained models in NLP inspires pre-trained models for programming language. Recently, CodeBERT, a model for both natural language (NL) and programming language (PL), pre-trained on code search dataset, is proposed. Although promising, CodeBERT has not been evaluated beyond its pre-trained dataset for NL-PL tasks. Also, it has only been shown effective on two tasks that are close in nature to its pre-trained data. This raises two questions: Can CodeBERT generalize beyond its pre-trained data? Can it generalize to various software engineering tasks involving NL and PL? Our work answers these questions by performing an empirical investigation into the generalizability of CodeBERT. First, we assess the generalizability of CodeBERT to datasets other than its pre-training data. Specifically, considering the code search task, we conduct experiments on another dataset containing Python code snippets and their corresponding documentation. We also consider yet another dataset of questions and answers collected from Stack Overflow about Python programming. Second, to assess the generalizability of CodeBERT to various software engineering tasks, we apply CodeBERT to the just-in-time defect prediction task. Our empirical results support the generalizability of CodeBERT on the additional data and task. CodeBERT-based solutions can achieve higher or comparable performance than specialized solutions designed for the code search and just-in-time defect prediction tasks. However, the superior performance of the CodeBERT requires a tradeoff; for example, it requires much more computation resources as compared to specialized code search approaches.

查看原文本刊更多论文

评估CodeBERT的泛化性

像BERT这样的预训练模型在许多自然语言处理(NLP)任务上取得了很大的进步，显示出了很强的泛化能力。NLP中预训练模型的成功启发了编程语言的预训练模型。CodeBERT是一种基于代码搜索数据集对自然语言(NL)和编程语言(PL)进行预训练的模型。尽管很有前途，CodeBERT还没有在NL-PL任务的预训练数据集之外进行评估。此外，它只在两个与预训练数据本质上接近的任务上显示有效。这就提出了两个问题:CodeBERT能在预训练数据之外泛化吗?它是否可以推广到涉及NL和PL的各种软件工程任务?我们的工作通过对CodeBERT的通用性进行实证调查来回答这些问题。首先，我们评估CodeBERT对其预训练数据以外的数据集的泛化性。具体来说，考虑到代码搜索任务，我们在另一个包含Python代码片段及其相应文档的数据集上进行实验。我们还考虑从Stack Overflow收集的关于Python编程的问题和答案的另一个数据集。其次，为了评估CodeBERT在各种软件工程任务中的泛化性，我们将CodeBERT应用于实时缺陷预测任务。我们的实证结果支持CodeBERT在附加数据和任务上的泛化性。基于codebert的解决方案可以实现比专为代码搜索和及时缺陷预测任务设计的解决方案更高或相当的性能。然而，CodeBERT的优越性能需要权衡;例如，与专门的代码搜索方法相比，它需要更多的计算资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量