Long Zhang , Xiaolin Ju , Lina Gong , Jiyu Wang , Zilong Ren
{"title":"通过自适应数据增强和及时调优增强长尾软件漏洞类型分类","authors":"Long Zhang , Xiaolin Ju , Lina Gong , Jiyu Wang , Zilong Ren","doi":"10.1016/j.asoc.2025.113612","DOIUrl":null,"url":null,"abstract":"<div><div>Software vulnerability type classification (SVTC) is essential for efficient and targeted remediation of vulnerabilities. With the rapid increase in software vulnerabilities, the demand for automated SVTC approaches is becoming increasingly critical. However, the SVTC is significantly affected by the long-tailed issues, where the distribution of vulnerability types is highly unbalanced. Specifically, a small number of head classes contain a large volume of samples, while a substantial portion of tail classes consists of only a limited number of samples. This imbalance poses a significant challenge to the classification accuracy of existing approaches. To alleviate these challenges, we propose an innovative approach VulTC-LTPF, which integrates prompt tuning with long-tailed learning to enhance the effectiveness of SVTC. Within VulTC-LTPF, an adaptive error-rate-based data augmentation strategy is developed. This strategy allows the SVTC model to dynamically augment data for tail classes types with limited sample size during training, thereby mitigating the impact of the long-tailed problem. Furthermore, VulTC-LTPF employs a hybrid prompt tuning strategy, aligning the training process more closely with pre-training, which enhances adaptability to downstream tasks. Unlike existing approaches that rely solely on either vulnerability description or source code, VulTC-LTPF leverages both sources of information. By incorporating a combination of hard and soft prompts, it facilitates a more comprehensive and effective classification strategy. Experimental results demonstrate that VulTC-LTPF achieves substantial performance improvements over four state-of-the-art SVTC baselines, with gains ranging from 26.1% to 55.1% in MCC. Ablation studies further validate the effectiveness of the adaptive data augmentation, prompt tuning, the integration of two types of vulnerability information, and the use of hybrid prompts. These findings highlight that VulTC-LTPF represents a promising advancement in the field of SVTC, offering significant potential for further progress in addressing software vulnerability type classification challenges.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"182 ","pages":"Article 113612"},"PeriodicalIF":7.2000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing long-tailed software vulnerability type classification via adaptive data augmentation and prompt tuning\",\"authors\":\"Long Zhang , Xiaolin Ju , Lina Gong , Jiyu Wang , Zilong Ren\",\"doi\":\"10.1016/j.asoc.2025.113612\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Software vulnerability type classification (SVTC) is essential for efficient and targeted remediation of vulnerabilities. With the rapid increase in software vulnerabilities, the demand for automated SVTC approaches is becoming increasingly critical. However, the SVTC is significantly affected by the long-tailed issues, where the distribution of vulnerability types is highly unbalanced. Specifically, a small number of head classes contain a large volume of samples, while a substantial portion of tail classes consists of only a limited number of samples. This imbalance poses a significant challenge to the classification accuracy of existing approaches. To alleviate these challenges, we propose an innovative approach VulTC-LTPF, which integrates prompt tuning with long-tailed learning to enhance the effectiveness of SVTC. Within VulTC-LTPF, an adaptive error-rate-based data augmentation strategy is developed. This strategy allows the SVTC model to dynamically augment data for tail classes types with limited sample size during training, thereby mitigating the impact of the long-tailed problem. Furthermore, VulTC-LTPF employs a hybrid prompt tuning strategy, aligning the training process more closely with pre-training, which enhances adaptability to downstream tasks. Unlike existing approaches that rely solely on either vulnerability description or source code, VulTC-LTPF leverages both sources of information. By incorporating a combination of hard and soft prompts, it facilitates a more comprehensive and effective classification strategy. Experimental results demonstrate that VulTC-LTPF achieves substantial performance improvements over four state-of-the-art SVTC baselines, with gains ranging from 26.1% to 55.1% in MCC. Ablation studies further validate the effectiveness of the adaptive data augmentation, prompt tuning, the integration of two types of vulnerability information, and the use of hybrid prompts. These findings highlight that VulTC-LTPF represents a promising advancement in the field of SVTC, offering significant potential for further progress in addressing software vulnerability type classification challenges.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"182 \",\"pages\":\"Article 113612\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625009238\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625009238","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Enhancing long-tailed software vulnerability type classification via adaptive data augmentation and prompt tuning
Software vulnerability type classification (SVTC) is essential for efficient and targeted remediation of vulnerabilities. With the rapid increase in software vulnerabilities, the demand for automated SVTC approaches is becoming increasingly critical. However, the SVTC is significantly affected by the long-tailed issues, where the distribution of vulnerability types is highly unbalanced. Specifically, a small number of head classes contain a large volume of samples, while a substantial portion of tail classes consists of only a limited number of samples. This imbalance poses a significant challenge to the classification accuracy of existing approaches. To alleviate these challenges, we propose an innovative approach VulTC-LTPF, which integrates prompt tuning with long-tailed learning to enhance the effectiveness of SVTC. Within VulTC-LTPF, an adaptive error-rate-based data augmentation strategy is developed. This strategy allows the SVTC model to dynamically augment data for tail classes types with limited sample size during training, thereby mitigating the impact of the long-tailed problem. Furthermore, VulTC-LTPF employs a hybrid prompt tuning strategy, aligning the training process more closely with pre-training, which enhances adaptability to downstream tasks. Unlike existing approaches that rely solely on either vulnerability description or source code, VulTC-LTPF leverages both sources of information. By incorporating a combination of hard and soft prompts, it facilitates a more comprehensive and effective classification strategy. Experimental results demonstrate that VulTC-LTPF achieves substantial performance improvements over four state-of-the-art SVTC baselines, with gains ranging from 26.1% to 55.1% in MCC. Ablation studies further validate the effectiveness of the adaptive data augmentation, prompt tuning, the integration of two types of vulnerability information, and the use of hybrid prompts. These findings highlight that VulTC-LTPF represents a promising advancement in the field of SVTC, offering significant potential for further progress in addressing software vulnerability type classification challenges.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.