Luis Ibanez-Lissen , Lorena Gonzalez-Manzano , Jose Maria de Fuentes , Nicolas Anciaux
{"title":"LPASS:线性探针作为使用压缩llm进行漏洞检测的垫脚石","authors":"Luis Ibanez-Lissen , Lorena Gonzalez-Manzano , Jose Maria de Fuentes , Nicolas Anciaux","doi":"10.1016/j.jisa.2025.104125","DOIUrl":null,"url":null,"abstract":"<div><div>Large Language Models (LLMs) are being extensively used for cybersecurity purposes. One of them is the detection of vulnerable codes. For the sake of efficiency and effectiveness, compression and fine-tuning techniques are being developed, respectively. However, they involve spending substantial computational efforts. In this vein, we analyze how Linear Probes (LPs) can be used to provide an estimation on the performance of a compressed LLM at an early phase — before fine-tuning. We also show their suitability to set the cut-off point when applying layer pruning compression. Our approach, dubbed <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>, is applied in BERT and Gemma for the detection of 12 of MITRE’s Top 25 most dangerous vulnerabilities on 480k C/C++ samples. LPs can be computed in 142.97 s. and provide key findings: (1) 33.3 % and 72.2% of layers can be removed, respectively, with no precision loss; (2) they provide an early estimate of the post-fine-tuning and post-compression model effectiveness, with 3% and 8.68% as the lowest and average precision errors, respectively. <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based LLMs outperform the state of the art, reaching 86.9% of accuracy in multi-class vulnerability detection. Interestingly, <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based compressed versions of Gemma outperform the original ones by 1.6% of F1-score at a maximum while saving 29.4 % and 23.8% of training and inference time and 42.98% of model size.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"93 ","pages":"Article 104125"},"PeriodicalIF":3.7000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs\",\"authors\":\"Luis Ibanez-Lissen , Lorena Gonzalez-Manzano , Jose Maria de Fuentes , Nicolas Anciaux\",\"doi\":\"10.1016/j.jisa.2025.104125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Large Language Models (LLMs) are being extensively used for cybersecurity purposes. One of them is the detection of vulnerable codes. For the sake of efficiency and effectiveness, compression and fine-tuning techniques are being developed, respectively. However, they involve spending substantial computational efforts. In this vein, we analyze how Linear Probes (LPs) can be used to provide an estimation on the performance of a compressed LLM at an early phase — before fine-tuning. We also show their suitability to set the cut-off point when applying layer pruning compression. Our approach, dubbed <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>, is applied in BERT and Gemma for the detection of 12 of MITRE’s Top 25 most dangerous vulnerabilities on 480k C/C++ samples. LPs can be computed in 142.97 s. and provide key findings: (1) 33.3 % and 72.2% of layers can be removed, respectively, with no precision loss; (2) they provide an early estimate of the post-fine-tuning and post-compression model effectiveness, with 3% and 8.68% as the lowest and average precision errors, respectively. <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based LLMs outperform the state of the art, reaching 86.9% of accuracy in multi-class vulnerability detection. Interestingly, <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based compressed versions of Gemma outperform the original ones by 1.6% of F1-score at a maximum while saving 29.4 % and 23.8% of training and inference time and 42.98% of model size.</div></div>\",\"PeriodicalId\":48638,\"journal\":{\"name\":\"Journal of Information Security and Applications\",\"volume\":\"93 \",\"pages\":\"Article 104125\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Security and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214212625001620\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212625001620","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs
Large Language Models (LLMs) are being extensively used for cybersecurity purposes. One of them is the detection of vulnerable codes. For the sake of efficiency and effectiveness, compression and fine-tuning techniques are being developed, respectively. However, they involve spending substantial computational efforts. In this vein, we analyze how Linear Probes (LPs) can be used to provide an estimation on the performance of a compressed LLM at an early phase — before fine-tuning. We also show their suitability to set the cut-off point when applying layer pruning compression. Our approach, dubbed , is applied in BERT and Gemma for the detection of 12 of MITRE’s Top 25 most dangerous vulnerabilities on 480k C/C++ samples. LPs can be computed in 142.97 s. and provide key findings: (1) 33.3 % and 72.2% of layers can be removed, respectively, with no precision loss; (2) they provide an early estimate of the post-fine-tuning and post-compression model effectiveness, with 3% and 8.68% as the lowest and average precision errors, respectively. -based LLMs outperform the state of the art, reaching 86.9% of accuracy in multi-class vulnerability detection. Interestingly, -based compressed versions of Gemma outperform the original ones by 1.6% of F1-score at a maximum while saving 29.4 % and 23.8% of training and inference time and 42.98% of model size.
期刊介绍:
Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.