LPASS：线性探针作为使用压缩llm进行漏洞检测的垫脚石

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Information Security and Applications Pub Date : 2025-06-21 DOI:10.1016/j.jisa.2025.104125

Luis Ibanez-Lissen , Lorena Gonzalez-Manzano , Jose Maria de Fuentes , Nicolas Anciaux

{"title":"LPASS：线性探针作为使用压缩llm进行漏洞检测的垫脚石","authors":"Luis Ibanez-Lissen , Lorena Gonzalez-Manzano , Jose Maria de Fuentes , Nicolas Anciaux","doi":"10.1016/j.jisa.2025.104125","DOIUrl":null,"url":null,"abstract":"<div><div>Large Language Models (LLMs) are being extensively used for cybersecurity purposes. One of them is the detection of vulnerable codes. For the sake of efficiency and effectiveness, compression and fine-tuning techniques are being developed, respectively. However, they involve spending substantial computational efforts. In this vein, we analyze how Linear Probes (LPs) can be used to provide an estimation on the performance of a compressed LLM at an early phase — before fine-tuning. We also show their suitability to set the cut-off point when applying layer pruning compression. Our approach, dubbed <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>, is applied in BERT and Gemma for the detection of 12 of MITRE’s Top 25 most dangerous vulnerabilities on 480k C/C++ samples. LPs can be computed in 142.97 s. and provide key findings: (1) 33.3 % and 72.2% of layers can be removed, respectively, with no precision loss; (2) they provide an early estimate of the post-fine-tuning and post-compression model effectiveness, with 3% and 8.68% as the lowest and average precision errors, respectively. <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based LLMs outperform the state of the art, reaching 86.9% of accuracy in multi-class vulnerability detection. Interestingly, <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based compressed versions of Gemma outperform the original ones by 1.6% of F1-score at a maximum while saving 29.4 % and 23.8% of training and inference time and 42.98% of model size.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"93 ","pages":"Article 104125"},"PeriodicalIF":3.7000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs\",\"authors\":\"Luis Ibanez-Lissen , Lorena Gonzalez-Manzano , Jose Maria de Fuentes , Nicolas Anciaux\",\"doi\":\"10.1016/j.jisa.2025.104125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Large Language Models (LLMs) are being extensively used for cybersecurity purposes. One of them is the detection of vulnerable codes. For the sake of efficiency and effectiveness, compression and fine-tuning techniques are being developed, respectively. However, they involve spending substantial computational efforts. In this vein, we analyze how Linear Probes (LPs) can be used to provide an estimation on the performance of a compressed LLM at an early phase — before fine-tuning. We also show their suitability to set the cut-off point when applying layer pruning compression. Our approach, dubbed <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>, is applied in BERT and Gemma for the detection of 12 of MITRE’s Top 25 most dangerous vulnerabilities on 480k C/C++ samples. LPs can be computed in 142.97 s. and provide key findings: (1) 33.3 % and 72.2% of layers can be removed, respectively, with no precision loss; (2) they provide an early estimate of the post-fine-tuning and post-compression model effectiveness, with 3% and 8.68% as the lowest and average precision errors, respectively. <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based LLMs outperform the state of the art, reaching 86.9% of accuracy in multi-class vulnerability detection. Interestingly, <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based compressed versions of Gemma outperform the original ones by 1.6% of F1-score at a maximum while saving 29.4 % and 23.8% of training and inference time and 42.98% of model size.</div></div>\",\"PeriodicalId\":48638,\"journal\":{\"name\":\"Journal of Information Security and Applications\",\"volume\":\"93 \",\"pages\":\"Article 104125\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Security and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214212625001620\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212625001620","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）被广泛用于网络安全目的。其中之一是检测易受攻击的代码。为了提高效率和效果，正在分别开发压缩和微调技术。然而，它们需要花费大量的计算工作。在这种情况下，我们分析了线性探针（lp）如何在微调之前的早期阶段提供对压缩LLM性能的估计。我们还证明了它们在应用层剪枝压缩时设置截断点的适用性。我们的方法被称为LPASS，在BERT和Gemma中用于检测MITRE在480k C/ c++样本上的前25个最危险漏洞中的12个。lp可在142.97 s内计算，并提供关键发现：(1)分别可以去除33.3%和72.2%的层，且精度没有损失；(2)提供了后微调和后压缩模型有效性的早期估计，最小和平均精度误差分别为3%和8.68%。基于lpass的llm优于目前的技术水平，在多类漏洞检测中准确率达到86.9%。有趣的是，基于lpass的Gemma压缩版本在训练和推理时间分别节省29.4%和23.8%，模型大小节省42.98%的同时，最大性能比原始版本高出F1-score 1.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs

Large Language Models (LLMs) are being extensively used for cybersecurity purposes. One of them is the detection of vulnerable codes. For the sake of efficiency and effectiveness, compression and fine-tuning techniques are being developed, respectively. However, they involve spending substantial computational efforts. In this vein, we analyze how Linear Probes (LPs) can be used to provide an estimation on the performance of a compressed LLM at an early phase — before fine-tuning. We also show their suitability to set the cut-off point when applying layer pruning compression. Our approach, dubbed

L P A S S

, is applied in BERT and Gemma for the detection of 12 of MITRE’s Top 25 most dangerous vulnerabilities on 480k C/C++ samples. LPs can be computed in 142.97 s. and provide key findings: (1) 33.3 % and 72.2% of layers can be removed, respectively, with no precision loss; (2) they provide an early estimate of the post-fine-tuning and post-compression model effectiveness, with 3% and 8.68% as the lowest and average precision errors, respectively.

L P A S S

-based LLMs outperform the state of the art, reaching 86.9% of accuracy in multi-class vulnerability detection. Interestingly,

L P A S S

-based compressed versions of Gemma outperform the original ones by 1.6% of F1-score at a maximum while saving 29.4 % and 23.8% of training and inference time and 42.98% of model size.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Information Security and Applications Computer Science-Computer Networks and Communications

CiteScore

10.90

自引率

5.40%

发文量

206

审稿时长

56 days

期刊介绍： Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.