Jianing Liu , Guanjun Lin , Huan Mei , Fan Yang , Yonghang Tai
{"title":"Enhancing vulnerability detection efficiency: An exploration of light-weight LLMs with hybrid code features","authors":"Jianing Liu , Guanjun Lin , Huan Mei , Fan Yang , Yonghang Tai","doi":"10.1016/j.jisa.2024.103925","DOIUrl":null,"url":null,"abstract":"<div><div>Vulnerability detection is a critical research topic. However, the performance of existing neural network-based approaches requires further improvement. The emergence of large language models (LLMs) has demonstrated their superior performance in natural language processing (NLP) compared to conventional neural architectures, motivating researchers to apply LLMs for vulnerability detection. This paper focuses on evaluating the performance of various Transformer-based LLMs for source-code-level vulnerability detection. We propose a framework named VulACLLM (AST & CFG-based LLMs Vulnerability Detection), which leverages combined feature sets derived from abstract Syntax Tree (AST) and Control Flow Graph (CFG). The recall rate of VulACLLM in the field of vulnerability detection reached 0.73, while the F1-score achieved 0.725. Experimental results show that the proposed feature sets significantly enhance detection performance. To further improve the efficiency of LLM-based detection, we examine the performance of LLMs compressed using two techniques: Knowledge Distillation (KD) and Low-Rank Adaptation (LoRA). To assess the performance of these compressed models, we introduce efficiency metrics that quantify both performance loss and efficiency gains achieved through compression. Our findings reveal that, compared to KD, LLMs compressed with LoRA achieve higher recall, achieving a maximum recall rate of 0.82, while substantially reducing training time, taking only 20 min to complete one epoch, and disk size, requiring only 4.89 MB of memory. The experimental results demonstrate that LoRA compression effectively mitigates deployment challenges associated with large model sizes and high video memory consumption, enabling the deployment of LoRA-compressed LLMs on consumer-level GPUs without compromising vulnerability detection performance.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"88 ","pages":"Article 103925"},"PeriodicalIF":3.8000,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212624002278","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Vulnerability detection is a critical research topic. However, the performance of existing neural network-based approaches requires further improvement. The emergence of large language models (LLMs) has demonstrated their superior performance in natural language processing (NLP) compared to conventional neural architectures, motivating researchers to apply LLMs for vulnerability detection. This paper focuses on evaluating the performance of various Transformer-based LLMs for source-code-level vulnerability detection. We propose a framework named VulACLLM (AST & CFG-based LLMs Vulnerability Detection), which leverages combined feature sets derived from abstract Syntax Tree (AST) and Control Flow Graph (CFG). The recall rate of VulACLLM in the field of vulnerability detection reached 0.73, while the F1-score achieved 0.725. Experimental results show that the proposed feature sets significantly enhance detection performance. To further improve the efficiency of LLM-based detection, we examine the performance of LLMs compressed using two techniques: Knowledge Distillation (KD) and Low-Rank Adaptation (LoRA). To assess the performance of these compressed models, we introduce efficiency metrics that quantify both performance loss and efficiency gains achieved through compression. Our findings reveal that, compared to KD, LLMs compressed with LoRA achieve higher recall, achieving a maximum recall rate of 0.82, while substantially reducing training time, taking only 20 min to complete one epoch, and disk size, requiring only 4.89 MB of memory. The experimental results demonstrate that LoRA compression effectively mitigates deployment challenges associated with large model sizes and high video memory consumption, enabling the deployment of LoRA-compressed LLMs on consumer-level GPUs without compromising vulnerability detection performance.
期刊介绍:
Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.