Does the Vulnerability Threaten Our Projects? Automated Vulnerable API Detection for Third-Party Libraries

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2024-09-05 DOI:10.1109/TSE.2024.3454960

Fangyuan Zhang;Lingling Fan;Sen Chen;Miaoying Cai;Sihan Xu;Lida Zhao

{"title":"Does the Vulnerability Threaten Our Projects? Automated Vulnerable API Detection for Third-Party Libraries","authors":"Fangyuan Zhang;Lingling Fan;Sen Chen;Miaoying Cai;Sihan Xu;Lida Zhao","doi":"10.1109/TSE.2024.3454960","DOIUrl":null,"url":null,"abstract":"Developers usually use third-party libraries (TPLs) to facilitate the development of their projects to avoid reinventing the wheels, however, the vulnerable TPLs indeed cause severe security threats. The majority of existing research only considered whether projects used vulnerable TPLs but neglected whether the vulnerable code of the TPLs was indeed used by the projects, which inevitably results in false positives and further requires additional patching efforts and maintenance costs (e.g., dependency conflict issues after version upgrades). To mitigate such a problem, we propose \n<monospace>VAScanner</monospace>\n, which can effectively identify vulnerable root methods causing vulnerabilities in TPLs and further identify all vulnerable APIs of TPLs used by Java projects. Specifically, we first collect the initial patch methods from the patch commits and extract accurate patch methods by employing a patch-unrelated sifting mechanism, then we further identify the vulnerable root methods for each vulnerability by employing an augmentation mechanism. Based on them, we leverage backward call graph analysis to identify all vulnerable APIs for each vulnerable TPL version and construct a database consisting of 90,749 (2,410,779 with library versions) vulnerable APIswith 1.45% false positive proportion with a 95% confidence interval (CI) of [1.31%, 1.59%] from 362 TPLs with 14,775 versions. The database serves as a reference database to help developers detect vulnerable APIs of TPLs used by projects. Our experiments show \n<monospace>VAScanner</monospace>\n eliminates 5.78% false positives and 2.16% false negatives owing to the proposed sifting and augmentation mechanisms. Besides, it outperforms the state-of-the-art method-level vulnerability detection tool in analyzing direct dependencies, Eclipse Steady, achieving more effective detection of vulnerable APIs. Furthermore, to investigate the real impact of vulnerabilities on real open-source projects, we exploit \n<monospace>VAScanner</monospace>\n to conduct a large-scale analysis on 3,147 projects that depend on vulnerable TPLs, and find only 21.51% of projects (with 1.83% false positive proportion and a 95% CI of [0.71%, 4.61%]) were threatened through vulnerable APIs, demonstrating that \n<monospace>VAScanner</monospace>\n can potentially reduce false positives significantly.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2906-2920"},"PeriodicalIF":6.5000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10666791/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Developers usually use third-party libraries (TPLs) to facilitate the development of their projects to avoid reinventing the wheels, however, the vulnerable TPLs indeed cause severe security threats. The majority of existing research only considered whether projects used vulnerable TPLs but neglected whether the vulnerable code of the TPLs was indeed used by the projects, which inevitably results in false positives and further requires additional patching efforts and maintenance costs (e.g., dependency conflict issues after version upgrades). To mitigate such a problem, we propose VAScanner , which can effectively identify vulnerable root methods causing vulnerabilities in TPLs and further identify all vulnerable APIs of TPLs used by Java projects. Specifically, we first collect the initial patch methods from the patch commits and extract accurate patch methods by employing a patch-unrelated sifting mechanism, then we further identify the vulnerable root methods for each vulnerability by employing an augmentation mechanism. Based on them, we leverage backward call graph analysis to identify all vulnerable APIs for each vulnerable TPL version and construct a database consisting of 90,749 (2,410,779 with library versions) vulnerable APIswith 1.45% false positive proportion with a 95% confidence interval (CI) of [1.31%, 1.59%] from 362 TPLs with 14,775 versions. The database serves as a reference database to help developers detect vulnerable APIs of TPLs used by projects. Our experiments show VAScanner eliminates 5.78% false positives and 2.16% false negatives owing to the proposed sifting and augmentation mechanisms. Besides, it outperforms the state-of-the-art method-level vulnerability detection tool in analyzing direct dependencies, Eclipse Steady, achieving more effective detection of vulnerable APIs. Furthermore, to investigate the real impact of vulnerabilities on real open-source projects, we exploit VAScanner to conduct a large-scale analysis on 3,147 projects that depend on vulnerable TPLs, and find only 21.51% of projects (with 1.83% false positive proportion and a 95% CI of [0.71%, 4.61%]) were threatened through vulnerable APIs, demonstrating that VAScanner can potentially reduce false positives significantly.

查看原文本刊更多论文

漏洞会威胁到我们的项目吗？第三方库的自动漏洞 API 检测

开发人员通常使用第三方库（TPL）来促进其项目的开发，以避免重复开发，然而，易受攻击的 TPL 确实会造成严重的安全威胁。现有的研究大多只考虑项目是否使用了易受攻击的 TPL，却忽略了项目是否确实使用了 TPL 的易受攻击代码，这不可避免地会导致误报，并进一步需要额外的补丁工作和维护成本（例如版本升级后的依赖冲突问题）。为了缓解这一问题，我们提出了 VAScanner，它可以有效识别导致 TPL 漏洞的易受攻击根方法，并进一步识别 Java 项目使用的 TPL 的所有易受攻击 API。具体来说，我们首先从补丁提交中收集初始补丁方法，并通过补丁无关筛选机制提取准确的补丁方法，然后通过增强机制进一步识别每个漏洞的易受攻击根方法。在此基础上，我们利用后向调用图分析法识别出每个易受攻击的 TPL 版本的所有易受攻击 API，并从 362 个 TPL 的 14,775 个版本中构建了一个包含 90,749 个（含库版本的 2,410,779 个）易受攻击 API 的数据库，其中假阳性比例为 1.45%，置信区间 (CI) 为 [1.31%, 1.59%]。该数据库可作为参考数据库，帮助开发人员检测项目中使用的 TPL 的易受攻击 API。我们的实验表明，由于采用了筛选和增强机制，VAScanner 消除了 5.78% 的误报和 2.16% 的误报。此外，在分析直接依赖关系方面，VAScanner 优于最先进的方法级漏洞检测工具 Eclipse Steady，能更有效地检测出有漏洞的 API。此外，为了研究漏洞对真实开源项目的实际影响，我们利用 VAScanner 对 3,147 个依赖于易受攻击 TPL 的项目进行了大规模分析，发现只有 21.51% 的项目（误报比例为 1.83%，95% CI 为 [0.71%, 4.61%]）受到易受攻击 API 的威胁，这表明 VAScanner 有可能显著降低误报率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.