结合即时调优和外部知识的提交分类框架

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software Pub Date : 2025-04-26 DOI:10.1049/sfw2/5566134

Jiajun Tong, Xiaobin Rui

{"title":"结合即时调优和外部知识的提交分类框架","authors":"Jiajun Tong, Xiaobin Rui","doi":"10.1049/sfw2/5566134","DOIUrl":null,"url":null,"abstract":"<div>\n <p>Commit classification is an important task in software maintenance, since it helps software developers classify code changes into different types according to their nature and purpose. This allows them to better understand how their development efforts are progressing, identify areas where they need improvement, and make informed decisions about when and how to release new versions of their software. However, existing methods are all discriminative models, usually with complex architectures that require additional output layers to produce class label probabilities, making them task-specific and unable to learn features across different tasks. Moreover, they require a large amount of labeled data for fine tuning, and it is difficult to learn effective classification boundaries in the case of limited labeled data. To solve the above problems, we propose a generative framework that incorporates prompt tuning for commit classification with external knowledge (IPCK), which simplifies the model structure and learns features across different tasks, only based on the commit message information as the input. First, we proposed a generative framework based on T5 (text-to-text transfer transformer). This encoder–decoder construction method unifies different commit classification tasks into a text-to-text problem, simplifying the model’s structure by not requiring an extra output layer. Second, instead of fine tuning, we design a prompt tuning solution that can be adopted in few-shot scenarios with only limited samples. Furthermore, we incorporate external knowledge via an external knowledge graph to map the probabilities of words into the final labels in the speech machine step to improve performance in few-shot scenarios. Extensive experiments on two open available datasets demonstrate that our framework can solve the commit classification problem simply but effectively for both single-label binary classification and single-label multiclass classification purposes with 90% and 83% accuracy. Further, in the few-shot scenarios, our method improves the adaptability of the model without requiring a large number of training samples for fine tuning.</p>\n </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2025 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/5566134","citationCount":"0","resultStr":"{\"title\":\"A Commit Classification Framework Incorporated With Prompt Tuning and External Knowledge\",\"authors\":\"Jiajun Tong, Xiaobin Rui\",\"doi\":\"10.1049/sfw2/5566134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n <p>Commit classification is an important task in software maintenance, since it helps software developers classify code changes into different types according to their nature and purpose. This allows them to better understand how their development efforts are progressing, identify areas where they need improvement, and make informed decisions about when and how to release new versions of their software. However, existing methods are all discriminative models, usually with complex architectures that require additional output layers to produce class label probabilities, making them task-specific and unable to learn features across different tasks. Moreover, they require a large amount of labeled data for fine tuning, and it is difficult to learn effective classification boundaries in the case of limited labeled data. To solve the above problems, we propose a generative framework that incorporates prompt tuning for commit classification with external knowledge (IPCK), which simplifies the model structure and learns features across different tasks, only based on the commit message information as the input. First, we proposed a generative framework based on T5 (text-to-text transfer transformer). This encoder–decoder construction method unifies different commit classification tasks into a text-to-text problem, simplifying the model’s structure by not requiring an extra output layer. Second, instead of fine tuning, we design a prompt tuning solution that can be adopted in few-shot scenarios with only limited samples. Furthermore, we incorporate external knowledge via an external knowledge graph to map the probabilities of words into the final labels in the speech machine step to improve performance in few-shot scenarios. Extensive experiments on two open available datasets demonstrate that our framework can solve the commit classification problem simply but effectively for both single-label binary classification and single-label multiclass classification purposes with 90% and 83% accuracy. Further, in the few-shot scenarios, our method improves the adaptability of the model without requiring a large number of training samples for fine tuning.</p>\\n </div>\",\"PeriodicalId\":50378,\"journal\":{\"name\":\"IET Software\",\"volume\":\"2025 1\",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2025-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2/5566134\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/sfw2/5566134\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Software","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/sfw2/5566134","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

提交分类是软件维护中的一项重要任务，因为它可以帮助软件开发人员根据代码更改的性质和目的将其分类为不同的类型。这使他们能够更好地了解他们的开发工作是如何进行的，确定他们需要改进的领域，并就何时以及如何发布软件的新版本做出明智的决定。然而，现有的方法都是判别模型，通常具有复杂的架构，需要额外的输出层来产生类标签概率，使它们特定于任务，无法跨不同任务学习特征。此外，它们需要大量的标记数据进行微调，并且在标记数据有限的情况下很难学习到有效的分类边界。为了解决上述问题，我们提出了一个生成框架，该框架结合了外部知识（IPCK）的提交分类提示调优，简化了模型结构，并仅基于提交消息信息作为输入，学习了不同任务之间的特征。首先，我们提出了一个基于T5（文本到文本传输转换器）的生成框架。这种编码器-解码器构造方法将不同的提交分类任务统一为文本到文本问题，通过不需要额外的输出层来简化模型的结构。其次，我们设计了一种可以在样本有限的情况下使用的快速调优方案，而不是微调。此外，我们通过外部知识图将外部知识整合到语音机器步骤中，将单词的概率映射到最终标签中，以提高在少数场景下的性能。在两个开放可用数据集上的大量实验表明，我们的框架可以简单有效地解决单标签二分类和单标签多分类的提交分类问题，准确率分别为90%和83%。此外，在少镜头场景下，我们的方法提高了模型的适应性，而不需要大量的训练样本进行微调。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A Commit Classification Framework Incorporated With Prompt Tuning and External Knowledge

查看原文本刊更多论文

A Commit Classification Framework Incorporated With Prompt Tuning and External Knowledge

Commit classification is an important task in software maintenance, since it helps software developers classify code changes into different types according to their nature and purpose. This allows them to better understand how their development efforts are progressing, identify areas where they need improvement, and make informed decisions about when and how to release new versions of their software. However, existing methods are all discriminative models, usually with complex architectures that require additional output layers to produce class label probabilities, making them task-specific and unable to learn features across different tasks. Moreover, they require a large amount of labeled data for fine tuning, and it is difficult to learn effective classification boundaries in the case of limited labeled data. To solve the above problems, we propose a generative framework that incorporates prompt tuning for commit classification with external knowledge (IPCK), which simplifies the model structure and learns features across different tasks, only based on the commit message information as the input. First, we proposed a generative framework based on T5 (text-to-text transfer transformer). This encoder–decoder construction method unifies different commit classification tasks into a text-to-text problem, simplifying the model’s structure by not requiring an extra output layer. Second, instead of fine tuning, we design a prompt tuning solution that can be adopted in few-shot scenarios with only limited samples. Furthermore, we incorporate external knowledge via an external knowledge graph to map the probabilities of words into the final labels in the speech machine step to improve performance in few-shot scenarios. Extensive experiments on two open available datasets demonstrate that our framework can solve the commit classification problem simply but effectively for both single-label binary classification and single-label multiclass classification purposes with 90% and 83% accuracy. Further, in the few-shot scenarios, our method improves the adaptability of the model without requiring a large number of training samples for fine tuning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IET Software 工程技术-计算机：软件工程

CiteScore

4.20

自引率

0.00%

发文量

审稿时长

9 months

期刊介绍： IET Software publishes papers on all aspects of the software lifecycle, including design, development, implementation and maintenance. The focus of the journal is on the methods used to develop and maintain software, and their practical application. Authors are especially encouraged to submit papers on the following topics, although papers on all aspects of software engineering are welcome: Software and systems requirements engineering Formal methods, design methods, practice and experience Software architecture, aspect and object orientation, reuse and re-engineering Testing, verification and validation techniques Software dependability and measurement Human systems engineering and human-computer interaction Knowledge engineering; expert and knowledge-based systems, intelligent agents Information systems engineering Application of software engineering in industry and commerce Software engineering technology transfer Management of software development Theoretical aspects of software development Machine learning Big data and big code Cloud computing Current Special Issue. Call for papers: Knowledge Discovery for Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_KDSD.pdf Big Data Analytics for Sustainable Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_BDASSD.pdf