Yuyong Liu;Zhifei Chen;Lin Chen;Yanhui Li;Xuansong Li;Wei Song
{"title":"COTE: Predicting Code-to-Test Co-Evolution by Integrating Link Analysis and Pre-Trained Language Model Techniques","authors":"Yuyong Liu;Zhifei Chen;Lin Chen;Yanhui Li;Xuansong Li;Wei Song","doi":"10.1109/TSE.2025.3583027","DOIUrl":null,"url":null,"abstract":"Tests, as an essential artifact, should co-evolve with the production code to ensure that the associated production code satisfies specification. However, developers often postpone or even forget to update tests, making the tests outdated and lag behind the code. To predict which tests need to be updated when production code is changed, it is challenging to identify all related tests and determine their change probabilities due to complex change scenarios. This paper fills the gap and proposes a hybrid approach named COTE to predict code-to-test co-evolution. We first compute the linked test candidates based on different code-to-test dependencies. After that, we identify common co-change patterns by building a method-level dependence graph. For the remaining ambiguous patterns, we leverage a pre-trained language model which captures the semantic features of code and the change reasons contained in commit messages to judge one test’s likelihood of being updated. Experiments on our datasets consisting of 6,314 samples extracted from 5,000 Java projects show that COTE outperforms state-of-the-art approaches, achieving a precision of 89.0% and a recall of 71.6%. This work can help practitioners reduce test maintenance costs and improve software quality.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2232-2253"},"PeriodicalIF":5.6000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11053682/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Tests, as an essential artifact, should co-evolve with the production code to ensure that the associated production code satisfies specification. However, developers often postpone or even forget to update tests, making the tests outdated and lag behind the code. To predict which tests need to be updated when production code is changed, it is challenging to identify all related tests and determine their change probabilities due to complex change scenarios. This paper fills the gap and proposes a hybrid approach named COTE to predict code-to-test co-evolution. We first compute the linked test candidates based on different code-to-test dependencies. After that, we identify common co-change patterns by building a method-level dependence graph. For the remaining ambiguous patterns, we leverage a pre-trained language model which captures the semantic features of code and the change reasons contained in commit messages to judge one test’s likelihood of being updated. Experiments on our datasets consisting of 6,314 samples extracted from 5,000 Java projects show that COTE outperforms state-of-the-art approaches, achieving a precision of 89.0% and a recall of 71.6%. This work can help practitioners reduce test maintenance costs and improve software quality.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.