Xinyue Long;Shikai Guo;Yu Chai;Hui Li;Sumaira Ameer Jan;Qian Ma;Qiao Ning
{"title":"基于语言融合适配器的迁移学习低资源代码漏洞检测","authors":"Xinyue Long;Shikai Guo;Yu Chai;Hui Li;Sumaira Ameer Jan;Qian Ma;Qiao Ning","doi":"10.1109/TCE.2025.3535638","DOIUrl":null,"url":null,"abstract":"Software vulnerabilities pose significant security threats to modern systems, particularly those involving complex execution sequences and intricate call relationships across multiple execution points.For instance, in a scenario where a software system integrates legacy code in a low-resource programming language like PHP, detecting vulnerabilities becomes challenging due to data scarcity and the complexity of temporal relationships among code fragments. This scarcity hampers the ability to capture critical temporal features essential for identifying vulnerabilities spanning multiple execution points.Consequently, existing approaches face major limitations, including neglecting temporal information in code fragments and lacking sufficient data to enable effective generalization for models in low-resource languages.To address these challenges, we introduce TaVer, a novel approach that enhances vulnerability detection in low-resource languages by extracting complex temporal features from code fragments and employing parameter-efficient transfer learning to leverage shared knowledge from resource-rich languages. TaVer comprises two key components: 1) Code Vulnerability Detection Component: This component models temporal dependencies by leveraging execution paths extracted from Abstract Syntax Trees (ASTs), capturing both short-term variations and long-term dependencies among code fragments. This enables comprehensive extraction of complex temporal features, significantly enhancing the accuracy of vulnerability detection. 2) Cross-Lingual Transfer Component: This component learns generalizable features from resource-rich languages and efficiently transfers them to low-resource languages. By updating a small number of downstream parameters, it enhances model generalization and achieves precise vulnerability detection. We evaluated TaVer using a diverse set of programming languages from publicly available GitHub repositories, employing C as the resource-rich source language and Java, Python, and PHP as relatively low-resource target languages. Experimental results demonstrate that TaVer outperforms four state-of-the-art approaches across multiple low-resource languages. Specifically, TaVer achieves average improvements of 14.63% in Accuracy, 30.59% in Precision, 37.32% in Recall, and 33.65% in F1-Score score over the best baseline approaches.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 1","pages":"1008-1023"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lingual-Fusion Adapter-Based Transfer Learning for Low-Resource Code Vulnerability Detection\",\"authors\":\"Xinyue Long;Shikai Guo;Yu Chai;Hui Li;Sumaira Ameer Jan;Qian Ma;Qiao Ning\",\"doi\":\"10.1109/TCE.2025.3535638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software vulnerabilities pose significant security threats to modern systems, particularly those involving complex execution sequences and intricate call relationships across multiple execution points.For instance, in a scenario where a software system integrates legacy code in a low-resource programming language like PHP, detecting vulnerabilities becomes challenging due to data scarcity and the complexity of temporal relationships among code fragments. This scarcity hampers the ability to capture critical temporal features essential for identifying vulnerabilities spanning multiple execution points.Consequently, existing approaches face major limitations, including neglecting temporal information in code fragments and lacking sufficient data to enable effective generalization for models in low-resource languages.To address these challenges, we introduce TaVer, a novel approach that enhances vulnerability detection in low-resource languages by extracting complex temporal features from code fragments and employing parameter-efficient transfer learning to leverage shared knowledge from resource-rich languages. TaVer comprises two key components: 1) Code Vulnerability Detection Component: This component models temporal dependencies by leveraging execution paths extracted from Abstract Syntax Trees (ASTs), capturing both short-term variations and long-term dependencies among code fragments. This enables comprehensive extraction of complex temporal features, significantly enhancing the accuracy of vulnerability detection. 2) Cross-Lingual Transfer Component: This component learns generalizable features from resource-rich languages and efficiently transfers them to low-resource languages. By updating a small number of downstream parameters, it enhances model generalization and achieves precise vulnerability detection. We evaluated TaVer using a diverse set of programming languages from publicly available GitHub repositories, employing C as the resource-rich source language and Java, Python, and PHP as relatively low-resource target languages. Experimental results demonstrate that TaVer outperforms four state-of-the-art approaches across multiple low-resource languages. Specifically, TaVer achieves average improvements of 14.63% in Accuracy, 30.59% in Precision, 37.32% in Recall, and 33.65% in F1-Score score over the best baseline approaches.\",\"PeriodicalId\":13208,\"journal\":{\"name\":\"IEEE Transactions on Consumer Electronics\",\"volume\":\"71 1\",\"pages\":\"1008-1023\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Consumer Electronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10856218/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10856218/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Lingual-Fusion Adapter-Based Transfer Learning for Low-Resource Code Vulnerability Detection
Software vulnerabilities pose significant security threats to modern systems, particularly those involving complex execution sequences and intricate call relationships across multiple execution points.For instance, in a scenario where a software system integrates legacy code in a low-resource programming language like PHP, detecting vulnerabilities becomes challenging due to data scarcity and the complexity of temporal relationships among code fragments. This scarcity hampers the ability to capture critical temporal features essential for identifying vulnerabilities spanning multiple execution points.Consequently, existing approaches face major limitations, including neglecting temporal information in code fragments and lacking sufficient data to enable effective generalization for models in low-resource languages.To address these challenges, we introduce TaVer, a novel approach that enhances vulnerability detection in low-resource languages by extracting complex temporal features from code fragments and employing parameter-efficient transfer learning to leverage shared knowledge from resource-rich languages. TaVer comprises two key components: 1) Code Vulnerability Detection Component: This component models temporal dependencies by leveraging execution paths extracted from Abstract Syntax Trees (ASTs), capturing both short-term variations and long-term dependencies among code fragments. This enables comprehensive extraction of complex temporal features, significantly enhancing the accuracy of vulnerability detection. 2) Cross-Lingual Transfer Component: This component learns generalizable features from resource-rich languages and efficiently transfers them to low-resource languages. By updating a small number of downstream parameters, it enhances model generalization and achieves precise vulnerability detection. We evaluated TaVer using a diverse set of programming languages from publicly available GitHub repositories, employing C as the resource-rich source language and Java, Python, and PHP as relatively low-resource target languages. Experimental results demonstrate that TaVer outperforms four state-of-the-art approaches across multiple low-resource languages. Specifically, TaVer achieves average improvements of 14.63% in Accuracy, 30.59% in Precision, 37.32% in Recall, and 33.65% in F1-Score score over the best baseline approaches.
期刊介绍:
The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.