COTE: Predicting Code-to-Test Co-Evolution by Integrating Link Analysis and Pre-Trained Language Model Techniques

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2025-06-27 DOI:10.1109/TSE.2025.3583027

Yuyong Liu;Zhifei Chen;Lin Chen;Yanhui Li;Xuansong Li;Wei Song

{"title":"COTE: Predicting Code-to-Test Co-Evolution by Integrating Link Analysis and Pre-Trained Language Model Techniques","authors":"Yuyong Liu;Zhifei Chen;Lin Chen;Yanhui Li;Xuansong Li;Wei Song","doi":"10.1109/TSE.2025.3583027","DOIUrl":null,"url":null,"abstract":"Tests, as an essential artifact, should co-evolve with the production code to ensure that the associated production code satisfies specification. However, developers often postpone or even forget to update tests, making the tests outdated and lag behind the code. To predict which tests need to be updated when production code is changed, it is challenging to identify all related tests and determine their change probabilities due to complex change scenarios. This paper fills the gap and proposes a hybrid approach named COTE to predict code-to-test co-evolution. We first compute the linked test candidates based on different code-to-test dependencies. After that, we identify common co-change patterns by building a method-level dependence graph. For the remaining ambiguous patterns, we leverage a pre-trained language model which captures the semantic features of code and the change reasons contained in commit messages to judge one test’s likelihood of being updated. Experiments on our datasets consisting of 6,314 samples extracted from 5,000 Java projects show that COTE outperforms state-of-the-art approaches, achieving a precision of 89.0% and a recall of 71.6%. This work can help practitioners reduce test maintenance costs and improve software quality.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2232-2253"},"PeriodicalIF":5.6000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11053682/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Tests, as an essential artifact, should co-evolve with the production code to ensure that the associated production code satisfies specification. However, developers often postpone or even forget to update tests, making the tests outdated and lag behind the code. To predict which tests need to be updated when production code is changed, it is challenging to identify all related tests and determine their change probabilities due to complex change scenarios. This paper fills the gap and proposes a hybrid approach named COTE to predict code-to-test co-evolution. We first compute the linked test candidates based on different code-to-test dependencies. After that, we identify common co-change patterns by building a method-level dependence graph. For the remaining ambiguous patterns, we leverage a pre-trained language model which captures the semantic features of code and the change reasons contained in commit messages to judge one test’s likelihood of being updated. Experiments on our datasets consisting of 6,314 samples extracted from 5,000 Java projects show that COTE outperforms state-of-the-art approaches, achieving a precision of 89.0% and a recall of 71.6%. This work can help practitioners reduce test maintenance costs and improve software quality.

查看原文本刊更多论文

通过集成链接分析和预训练语言模型技术预测代码到测试的共同进化

测试，作为一个重要的工件，应该与生产代码共同发展，以确保相关的生产代码满足规范。然而，开发人员经常推迟甚至忘记更新测试，使测试过时并落后于代码。为了预测在产品代码更改时需要更新哪些测试，识别所有相关的测试并确定由于复杂的更改场景而导致的更改概率是具有挑战性的。本文填补了这一空白，并提出了一种名为COTE的混合方法来预测代码到测试的共同进化。我们首先根据不同的代码到测试依赖关系计算链接的测试候选项。之后，我们通过构建方法级依赖图来识别常见的共变更模式。对于剩余的模糊模式，我们利用一个预先训练好的语言模型，该模型捕获代码的语义特征和提交消息中包含的更改原因，以判断一个测试被更新的可能性。在我们从5000个Java项目中提取的6314个样本的数据集上进行的实验表明，COTE优于最先进的方法，达到了89.0%的精度和71.6%的召回率。这项工作可以帮助实践者减少测试维护成本并提高软件质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.