LPASS:线性探针作为使用压缩llm进行漏洞检测的垫脚石

IF 3.7 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Luis Ibanez-Lissen , Lorena Gonzalez-Manzano , Jose Maria de Fuentes , Nicolas Anciaux
{"title":"LPASS:线性探针作为使用压缩llm进行漏洞检测的垫脚石","authors":"Luis Ibanez-Lissen ,&nbsp;Lorena Gonzalez-Manzano ,&nbsp;Jose Maria de Fuentes ,&nbsp;Nicolas Anciaux","doi":"10.1016/j.jisa.2025.104125","DOIUrl":null,"url":null,"abstract":"<div><div>Large Language Models (LLMs) are being extensively used for cybersecurity purposes. One of them is the detection of vulnerable codes. For the sake of efficiency and effectiveness, compression and fine-tuning techniques are being developed, respectively. However, they involve spending substantial computational efforts. In this vein, we analyze how Linear Probes (LPs) can be used to provide an estimation on the performance of a compressed LLM at an early phase — before fine-tuning. We also show their suitability to set the cut-off point when applying layer pruning compression. Our approach, dubbed <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>, is applied in BERT and Gemma for the detection of 12 of MITRE’s Top 25 most dangerous vulnerabilities on 480k C/C++ samples. LPs can be computed in 142.97 s. and provide key findings: (1) 33.3 % and 72.2% of layers can be removed, respectively, with no precision loss; (2) they provide an early estimate of the post-fine-tuning and post-compression model effectiveness, with 3% and 8.68% as the lowest and average precision errors, respectively. <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based LLMs outperform the state of the art, reaching 86.9% of accuracy in multi-class vulnerability detection. Interestingly, <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based compressed versions of Gemma outperform the original ones by 1.6% of F1-score at a maximum while saving 29.4 % and 23.8% of training and inference time and 42.98% of model size.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"93 ","pages":"Article 104125"},"PeriodicalIF":3.7000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs\",\"authors\":\"Luis Ibanez-Lissen ,&nbsp;Lorena Gonzalez-Manzano ,&nbsp;Jose Maria de Fuentes ,&nbsp;Nicolas Anciaux\",\"doi\":\"10.1016/j.jisa.2025.104125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Large Language Models (LLMs) are being extensively used for cybersecurity purposes. One of them is the detection of vulnerable codes. For the sake of efficiency and effectiveness, compression and fine-tuning techniques are being developed, respectively. However, they involve spending substantial computational efforts. In this vein, we analyze how Linear Probes (LPs) can be used to provide an estimation on the performance of a compressed LLM at an early phase — before fine-tuning. We also show their suitability to set the cut-off point when applying layer pruning compression. Our approach, dubbed <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>, is applied in BERT and Gemma for the detection of 12 of MITRE’s Top 25 most dangerous vulnerabilities on 480k C/C++ samples. LPs can be computed in 142.97 s. and provide key findings: (1) 33.3 % and 72.2% of layers can be removed, respectively, with no precision loss; (2) they provide an early estimate of the post-fine-tuning and post-compression model effectiveness, with 3% and 8.68% as the lowest and average precision errors, respectively. <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based LLMs outperform the state of the art, reaching 86.9% of accuracy in multi-class vulnerability detection. Interestingly, <span><math><mrow><mi>L</mi><mi>P</mi><mi>A</mi><mi>S</mi><mi>S</mi></mrow></math></span>-based compressed versions of Gemma outperform the original ones by 1.6% of F1-score at a maximum while saving 29.4 % and 23.8% of training and inference time and 42.98% of model size.</div></div>\",\"PeriodicalId\":48638,\"journal\":{\"name\":\"Journal of Information Security and Applications\",\"volume\":\"93 \",\"pages\":\"Article 104125\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Security and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214212625001620\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212625001620","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型(llm)被广泛用于网络安全目的。其中之一是检测易受攻击的代码。为了提高效率和效果,正在分别开发压缩和微调技术。然而,它们需要花费大量的计算工作。在这种情况下,我们分析了线性探针(lp)如何在微调之前的早期阶段提供对压缩LLM性能的估计。我们还证明了它们在应用层剪枝压缩时设置截断点的适用性。我们的方法被称为LPASS,在BERT和Gemma中用于检测MITRE在480k C/ c++样本上的前25个最危险漏洞中的12个。lp可在142.97 s内计算,并提供关键发现:(1)分别可以去除33.3%和72.2%的层,且精度没有损失;(2)提供了后微调和后压缩模型有效性的早期估计,最小和平均精度误差分别为3%和8.68%。基于lpass的llm优于目前的技术水平,在多类漏洞检测中准确率达到86.9%。有趣的是,基于lpass的Gemma压缩版本在训练和推理时间分别节省29.4%和23.8%,模型大小节省42.98%的同时,最大性能比原始版本高出F1-score 1.6%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs
Large Language Models (LLMs) are being extensively used for cybersecurity purposes. One of them is the detection of vulnerable codes. For the sake of efficiency and effectiveness, compression and fine-tuning techniques are being developed, respectively. However, they involve spending substantial computational efforts. In this vein, we analyze how Linear Probes (LPs) can be used to provide an estimation on the performance of a compressed LLM at an early phase — before fine-tuning. We also show their suitability to set the cut-off point when applying layer pruning compression. Our approach, dubbed LPASS, is applied in BERT and Gemma for the detection of 12 of MITRE’s Top 25 most dangerous vulnerabilities on 480k C/C++ samples. LPs can be computed in 142.97 s. and provide key findings: (1) 33.3 % and 72.2% of layers can be removed, respectively, with no precision loss; (2) they provide an early estimate of the post-fine-tuning and post-compression model effectiveness, with 3% and 8.68% as the lowest and average precision errors, respectively. LPASS-based LLMs outperform the state of the art, reaching 86.9% of accuracy in multi-class vulnerability detection. Interestingly, LPASS-based compressed versions of Gemma outperform the original ones by 1.6% of F1-score at a maximum while saving 29.4 % and 23.8% of training and inference time and 42.98% of model size.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Information Security and Applications
Journal of Information Security and Applications Computer Science-Computer Networks and Communications
CiteScore
10.90
自引率
5.40%
发文量
206
审稿时长
56 days
期刊介绍: Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信