Yan Xiao , Xinyue Zuo , Xiaoyue Lu , Jin Song Dong , Xiaochun Cao , Ivan Beschastnikh
{"title":"在SE研究中使用基于transformer的模型的承诺和风险。","authors":"Yan Xiao , Xinyue Zuo , Xiaoyue Lu , Jin Song Dong , Xiaochun Cao , Ivan Beschastnikh","doi":"10.1016/j.neunet.2024.107067","DOIUrl":null,"url":null,"abstract":"<div><div>Many Transformer-based pre-trained models for code have been developed and applied to code-related tasks. In this paper, we analyze 519 papers published on this topic during 2017–2023, examine the suitability of model architectures for different tasks, summarize their resource consumption, and look at the generalization ability of models on different datasets.</div><div>We examine three representative pre-trained models for code: CodeBERT, CodeGPT, and CodeT5, and conduct experiments on the four topmost targeted software engineering tasks from the literature: Bug Fixing, Bug Detection, Code Summarization, and Code Search.</div><div>We make four important empirical contributions to the field. First, we demonstrate that encoder-only models (CodeBERT) can outperform encoder–decoder models for general-purpose coding tasks, and showcase the capability of decoder-only models (CodeGPT) for certain generation tasks. Second, we study the most frequently used model-task combinations in the literature and find that less popular models can provide higher performance. Third, we find that CodeBERT is efficient in understanding tasks while CodeT5’s efficiency is unreliable on generation tasks due to its high resource consumption. Fourth, we report on poor model generalization for the most popular benchmarks and datasets on Bug Fixing and Code Summarization tasks.</div><div>We frame our contributions in terms of promises and perils, and document the numerous practical issues in advancing future research on transformer-based models for code-related tasks.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"Article 107067"},"PeriodicalIF":6.3000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Promises and perils of using Transformer-based models for SE research\",\"authors\":\"Yan Xiao , Xinyue Zuo , Xiaoyue Lu , Jin Song Dong , Xiaochun Cao , Ivan Beschastnikh\",\"doi\":\"10.1016/j.neunet.2024.107067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Many Transformer-based pre-trained models for code have been developed and applied to code-related tasks. In this paper, we analyze 519 papers published on this topic during 2017–2023, examine the suitability of model architectures for different tasks, summarize their resource consumption, and look at the generalization ability of models on different datasets.</div><div>We examine three representative pre-trained models for code: CodeBERT, CodeGPT, and CodeT5, and conduct experiments on the four topmost targeted software engineering tasks from the literature: Bug Fixing, Bug Detection, Code Summarization, and Code Search.</div><div>We make four important empirical contributions to the field. First, we demonstrate that encoder-only models (CodeBERT) can outperform encoder–decoder models for general-purpose coding tasks, and showcase the capability of decoder-only models (CodeGPT) for certain generation tasks. Second, we study the most frequently used model-task combinations in the literature and find that less popular models can provide higher performance. Third, we find that CodeBERT is efficient in understanding tasks while CodeT5’s efficiency is unreliable on generation tasks due to its high resource consumption. Fourth, we report on poor model generalization for the most popular benchmarks and datasets on Bug Fixing and Code Summarization tasks.</div><div>We frame our contributions in terms of promises and perils, and document the numerous practical issues in advancing future research on transformer-based models for code-related tasks.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"184 \",\"pages\":\"Article 107067\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2024-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608024009961\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024009961","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Promises and perils of using Transformer-based models for SE research
Many Transformer-based pre-trained models for code have been developed and applied to code-related tasks. In this paper, we analyze 519 papers published on this topic during 2017–2023, examine the suitability of model architectures for different tasks, summarize their resource consumption, and look at the generalization ability of models on different datasets.
We examine three representative pre-trained models for code: CodeBERT, CodeGPT, and CodeT5, and conduct experiments on the four topmost targeted software engineering tasks from the literature: Bug Fixing, Bug Detection, Code Summarization, and Code Search.
We make four important empirical contributions to the field. First, we demonstrate that encoder-only models (CodeBERT) can outperform encoder–decoder models for general-purpose coding tasks, and showcase the capability of decoder-only models (CodeGPT) for certain generation tasks. Second, we study the most frequently used model-task combinations in the literature and find that less popular models can provide higher performance. Third, we find that CodeBERT is efficient in understanding tasks while CodeT5’s efficiency is unreliable on generation tasks due to its high resource consumption. Fourth, we report on poor model generalization for the most popular benchmarks and datasets on Bug Fixing and Code Summarization tasks.
We frame our contributions in terms of promises and perils, and document the numerous practical issues in advancing future research on transformer-based models for code-related tasks.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.