{"title":"结合即时调优和外部知识的提交分类框架","authors":"Jiajun Tong, Xiaobin Rui","doi":"10.1049/sfw2/5566134","DOIUrl":null,"url":null,"abstract":"<div>\n <p>Commit classification is an important task in software maintenance, since it helps software developers classify code changes into different types according to their nature and purpose. This allows them to better understand how their development efforts are progressing, identify areas where they need improvement, and make informed decisions about when and how to release new versions of their software. However, existing methods are all discriminative models, usually with complex architectures that require additional output layers to produce class label probabilities, making them task-specific and unable to learn features across different tasks. Moreover, they require a large amount of labeled data for fine tuning, and it is difficult to learn effective classification boundaries in the case of limited labeled data. To solve the above problems, we propose a generative framework that incorporates prompt tuning for commit classification with external knowledge (IPCK), which simplifies the model structure and learns features across different tasks, only based on the commit message information as the input. First, we proposed a generative framework based on T5 (text-to-text transfer transformer). This encoder–decoder construction method unifies different commit classification tasks into a text-to-text problem, simplifying the model’s structure by not requiring an extra output layer. Second, instead of fine tuning, we design a prompt tuning solution that can be adopted in few-shot scenarios with only limited samples. Furthermore, we incorporate external knowledge via an external knowledge graph to map the probabilities of words into the final labels in the speech machine step to improve performance in few-shot scenarios. Extensive experiments on two open available datasets demonstrate that our framework can solve the commit classification problem simply but effectively for both single-label binary classification and single-label multiclass classification purposes with 90% and 83% accuracy. Further, in the few-shot scenarios, our method improves the adaptability of the model without requiring a large number of training samples for fine tuning.</p>\n </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2025 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/5566134","citationCount":"0","resultStr":"{\"title\":\"A Commit Classification Framework Incorporated With Prompt Tuning and External Knowledge\",\"authors\":\"Jiajun Tong, Xiaobin Rui\",\"doi\":\"10.1049/sfw2/5566134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n <p>Commit classification is an important task in software maintenance, since it helps software developers classify code changes into different types according to their nature and purpose. This allows them to better understand how their development efforts are progressing, identify areas where they need improvement, and make informed decisions about when and how to release new versions of their software. However, existing methods are all discriminative models, usually with complex architectures that require additional output layers to produce class label probabilities, making them task-specific and unable to learn features across different tasks. Moreover, they require a large amount of labeled data for fine tuning, and it is difficult to learn effective classification boundaries in the case of limited labeled data. To solve the above problems, we propose a generative framework that incorporates prompt tuning for commit classification with external knowledge (IPCK), which simplifies the model structure and learns features across different tasks, only based on the commit message information as the input. First, we proposed a generative framework based on T5 (text-to-text transfer transformer). This encoder–decoder construction method unifies different commit classification tasks into a text-to-text problem, simplifying the model’s structure by not requiring an extra output layer. Second, instead of fine tuning, we design a prompt tuning solution that can be adopted in few-shot scenarios with only limited samples. Furthermore, we incorporate external knowledge via an external knowledge graph to map the probabilities of words into the final labels in the speech machine step to improve performance in few-shot scenarios. Extensive experiments on two open available datasets demonstrate that our framework can solve the commit classification problem simply but effectively for both single-label binary classification and single-label multiclass classification purposes with 90% and 83% accuracy. Further, in the few-shot scenarios, our method improves the adaptability of the model without requiring a large number of training samples for fine tuning.</p>\\n </div>\",\"PeriodicalId\":50378,\"journal\":{\"name\":\"IET Software\",\"volume\":\"2025 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/5566134\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/sfw2/5566134\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Software","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/sfw2/5566134","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
A Commit Classification Framework Incorporated With Prompt Tuning and External Knowledge
Commit classification is an important task in software maintenance, since it helps software developers classify code changes into different types according to their nature and purpose. This allows them to better understand how their development efforts are progressing, identify areas where they need improvement, and make informed decisions about when and how to release new versions of their software. However, existing methods are all discriminative models, usually with complex architectures that require additional output layers to produce class label probabilities, making them task-specific and unable to learn features across different tasks. Moreover, they require a large amount of labeled data for fine tuning, and it is difficult to learn effective classification boundaries in the case of limited labeled data. To solve the above problems, we propose a generative framework that incorporates prompt tuning for commit classification with external knowledge (IPCK), which simplifies the model structure and learns features across different tasks, only based on the commit message information as the input. First, we proposed a generative framework based on T5 (text-to-text transfer transformer). This encoder–decoder construction method unifies different commit classification tasks into a text-to-text problem, simplifying the model’s structure by not requiring an extra output layer. Second, instead of fine tuning, we design a prompt tuning solution that can be adopted in few-shot scenarios with only limited samples. Furthermore, we incorporate external knowledge via an external knowledge graph to map the probabilities of words into the final labels in the speech machine step to improve performance in few-shot scenarios. Extensive experiments on two open available datasets demonstrate that our framework can solve the commit classification problem simply but effectively for both single-label binary classification and single-label multiclass classification purposes with 90% and 83% accuracy. Further, in the few-shot scenarios, our method improves the adaptability of the model without requiring a large number of training samples for fine tuning.
期刊介绍:
IET Software publishes papers on all aspects of the software lifecycle, including design, development, implementation and maintenance. The focus of the journal is on the methods used to develop and maintain software, and their practical application.
Authors are especially encouraged to submit papers on the following topics, although papers on all aspects of software engineering are welcome:
Software and systems requirements engineering
Formal methods, design methods, practice and experience
Software architecture, aspect and object orientation, reuse and re-engineering
Testing, verification and validation techniques
Software dependability and measurement
Human systems engineering and human-computer interaction
Knowledge engineering; expert and knowledge-based systems, intelligent agents
Information systems engineering
Application of software engineering in industry and commerce
Software engineering technology transfer
Management of software development
Theoretical aspects of software development
Machine learning
Big data and big code
Cloud computing
Current Special Issue. Call for papers:
Knowledge Discovery for Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_KDSD.pdf
Big Data Analytics for Sustainable Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_BDASSD.pdf