LLM-Enhanced Software Patch Localization

arXiv - CS - Cryptography and Security Pub Date : 2024-09-10 DOI:arxiv-2409.06816

Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang

{"title":"LLM-Enhanced Software Patch Localization","authors":"Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang","doi":"arxiv-2409.06816","DOIUrl":null,"url":null,"abstract":"Open source software (OSS) is integral to modern product development, and any\nvulnerability within it potentially compromises numerous products. While\ndevelopers strive to apply security patches, pinpointing these patches among\nextensive OSS updates remains a challenge. Security patch localization (SPL)\nrecommendation methods are leading approaches to address this. However,\nexisting SPL models often falter when a commit lacks a clear association with\nits corresponding CVE, and do not consider a scenario that a vulnerability has\nmultiple patches proposed over time before it has been fully resolved. To\naddress these challenges, we introduce LLM-SPL, a recommendation-based SPL\napproach that leverages the capabilities of the Large Language Model (LLM) to\nlocate the security patch commit for a given CVE. More specifically, we propose\na joint learning framework, in which the outputs of LLM serves as additional\nfeatures to aid our recommendation model in prioritizing security patches. Our\nevaluation on a dataset of 1,915 CVEs associated with 2,461 patches\ndemonstrates that LLM-SPL excels in ranking patch commits, surpassing the\nstate-of-the-art method in terms of Recall, while significantly reducing manual\neffort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL\nsignificantly improves Recall by 22.83\\%, NDCG by 19.41\\%, and reduces manual\neffort by over 25\\% when checking up to the top 10 rankings. The dataset and\nsource code are available at\n\\url{https://anonymous.4open.science/r/LLM-SPL-91F8}.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Open source software (OSS) is integral to modern product development, and any vulnerability within it potentially compromises numerous products. While developers strive to apply security patches, pinpointing these patches among extensive OSS updates remains a challenge. Security patch localization (SPL) recommendation methods are leading approaches to address this. However, existing SPL models often falter when a commit lacks a clear association with its corresponding CVE, and do not consider a scenario that a vulnerability has multiple patches proposed over time before it has been fully resolved. To address these challenges, we introduce LLM-SPL, a recommendation-based SPL approach that leverages the capabilities of the Large Language Model (LLM) to locate the security patch commit for a given CVE. More specifically, we propose a joint learning framework, in which the outputs of LLM serves as additional features to aid our recommendation model in prioritizing security patches. Our evaluation on a dataset of 1,915 CVEs associated with 2,461 patches demonstrates that LLM-SPL excels in ranking patch commits, surpassing the state-of-the-art method in terms of Recall, while significantly reducing manual effort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL significantly improves Recall by 22.83\%, NDCG by 19.41\%, and reduces manual effort by over 25\% when checking up to the top 10 rankings. The dataset and source code are available at \url{https://anonymous.4open.science/r/LLM-SPL-91F8}.

查看原文本刊更多论文

LLM 增强型软件补丁本地化

开放源码软件（OSS）是现代产品开发不可或缺的一部分，其中的任何漏洞都有可能危及众多产品。虽然开发人员努力应用安全补丁，但在广泛的开放源码软件更新中精确定位这些补丁仍然是一项挑战。安全补丁本地化（SPL）推荐方法是解决这一问题的主要方法。然而，现有的 SPL 模型在提交与相应的 CVE 缺乏明确关联时往往会出现问题，而且不会考虑漏洞在被完全解决之前会随着时间的推移被提出多个补丁的情况。为了应对这些挑战，我们引入了 LLM-SPL，这是一种基于推荐的 SPL 方法，它利用大语言模型（LLM）的功能来定位给定 CVE 的安全补丁提交。更具体地说，我们提出了一个联合学习框架，其中 LLM 的输出可作为额外特征，帮助我们的推荐模型确定安全补丁的优先级。对与 2,461 个补丁相关联的 1,915 个 CVE 数据集进行的评估表明，LLM-SPL 在补丁提交排序方面表现出色，在召回率方面超过了最先进的方法，同时大大减少了人工操作的工作量。值得注意的是，对于需要多个补丁的漏洞，LLM-SPL显著提高了22.83%的召回率（Recall）和19.41%的NDCG（NDCG），并且在检查前10个排名时减少了超过25%的人工工作量。数据集和源代码请访问：url{https://anonymous.4open.science/r/LLM-SPL-91F8}。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Cryptography and Security

自引率

0.00%

发文量