LLM 增强型软件补丁本地化

Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang
{"title":"LLM 增强型软件补丁本地化","authors":"Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang","doi":"arxiv-2409.06816","DOIUrl":null,"url":null,"abstract":"Open source software (OSS) is integral to modern product development, and any\nvulnerability within it potentially compromises numerous products. While\ndevelopers strive to apply security patches, pinpointing these patches among\nextensive OSS updates remains a challenge. Security patch localization (SPL)\nrecommendation methods are leading approaches to address this. However,\nexisting SPL models often falter when a commit lacks a clear association with\nits corresponding CVE, and do not consider a scenario that a vulnerability has\nmultiple patches proposed over time before it has been fully resolved. To\naddress these challenges, we introduce LLM-SPL, a recommendation-based SPL\napproach that leverages the capabilities of the Large Language Model (LLM) to\nlocate the security patch commit for a given CVE. More specifically, we propose\na joint learning framework, in which the outputs of LLM serves as additional\nfeatures to aid our recommendation model in prioritizing security patches. Our\nevaluation on a dataset of 1,915 CVEs associated with 2,461 patches\ndemonstrates that LLM-SPL excels in ranking patch commits, surpassing the\nstate-of-the-art method in terms of Recall, while significantly reducing manual\neffort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL\nsignificantly improves Recall by 22.83\\%, NDCG by 19.41\\%, and reduces manual\neffort by over 25\\% when checking up to the top 10 rankings. The dataset and\nsource code are available at\n\\url{https://anonymous.4open.science/r/LLM-SPL-91F8}.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-Enhanced Software Patch Localization\",\"authors\":\"Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang\",\"doi\":\"arxiv-2409.06816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Open source software (OSS) is integral to modern product development, and any\\nvulnerability within it potentially compromises numerous products. While\\ndevelopers strive to apply security patches, pinpointing these patches among\\nextensive OSS updates remains a challenge. Security patch localization (SPL)\\nrecommendation methods are leading approaches to address this. However,\\nexisting SPL models often falter when a commit lacks a clear association with\\nits corresponding CVE, and do not consider a scenario that a vulnerability has\\nmultiple patches proposed over time before it has been fully resolved. To\\naddress these challenges, we introduce LLM-SPL, a recommendation-based SPL\\napproach that leverages the capabilities of the Large Language Model (LLM) to\\nlocate the security patch commit for a given CVE. More specifically, we propose\\na joint learning framework, in which the outputs of LLM serves as additional\\nfeatures to aid our recommendation model in prioritizing security patches. Our\\nevaluation on a dataset of 1,915 CVEs associated with 2,461 patches\\ndemonstrates that LLM-SPL excels in ranking patch commits, surpassing the\\nstate-of-the-art method in terms of Recall, while significantly reducing manual\\neffort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL\\nsignificantly improves Recall by 22.83\\\\%, NDCG by 19.41\\\\%, and reduces manual\\neffort by over 25\\\\% when checking up to the top 10 rankings. The dataset and\\nsource code are available at\\n\\\\url{https://anonymous.4open.science/r/LLM-SPL-91F8}.\",\"PeriodicalId\":501332,\"journal\":{\"name\":\"arXiv - CS - Cryptography and Security\",\"volume\":\"57 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Cryptography and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06816\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

开放源码软件(OSS)是现代产品开发不可或缺的一部分,其中的任何漏洞都有可能危及众多产品。虽然开发人员努力应用安全补丁,但在广泛的开放源码软件更新中精确定位这些补丁仍然是一项挑战。安全补丁本地化(SPL)推荐方法是解决这一问题的主要方法。然而,现有的 SPL 模型在提交与相应的 CVE 缺乏明确关联时往往会出现问题,而且不会考虑漏洞在被完全解决之前会随着时间的推移被提出多个补丁的情况。为了应对这些挑战,我们引入了 LLM-SPL,这是一种基于推荐的 SPL 方法,它利用大语言模型(LLM)的功能来定位给定 CVE 的安全补丁提交。更具体地说,我们提出了一个联合学习框架,其中 LLM 的输出可作为额外特征,帮助我们的推荐模型确定安全补丁的优先级。对与 2,461 个补丁相关联的 1,915 个 CVE 数据集进行的评估表明,LLM-SPL 在补丁提交排序方面表现出色,在召回率方面超过了最先进的方法,同时大大减少了人工操作的工作量。值得注意的是,对于需要多个补丁的漏洞,LLM-SPL显著提高了22.83%的召回率(Recall)和19.41%的NDCG(NDCG),并且在检查前10个排名时减少了超过25%的人工工作量。数据集和源代码请访问:url{https://anonymous.4open.science/r/LLM-SPL-91F8}。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
LLM-Enhanced Software Patch Localization
Open source software (OSS) is integral to modern product development, and any vulnerability within it potentially compromises numerous products. While developers strive to apply security patches, pinpointing these patches among extensive OSS updates remains a challenge. Security patch localization (SPL) recommendation methods are leading approaches to address this. However, existing SPL models often falter when a commit lacks a clear association with its corresponding CVE, and do not consider a scenario that a vulnerability has multiple patches proposed over time before it has been fully resolved. To address these challenges, we introduce LLM-SPL, a recommendation-based SPL approach that leverages the capabilities of the Large Language Model (LLM) to locate the security patch commit for a given CVE. More specifically, we propose a joint learning framework, in which the outputs of LLM serves as additional features to aid our recommendation model in prioritizing security patches. Our evaluation on a dataset of 1,915 CVEs associated with 2,461 patches demonstrates that LLM-SPL excels in ranking patch commits, surpassing the state-of-the-art method in terms of Recall, while significantly reducing manual effort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL significantly improves Recall by 22.83\%, NDCG by 19.41\%, and reduces manual effort by over 25\% when checking up to the top 10 rankings. The dataset and source code are available at \url{https://anonymous.4open.science/r/LLM-SPL-91F8}.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信