{"title":"Does the Vulnerability Threaten Our Projects? Automated Vulnerable API Detection for Third-Party Libraries","authors":"Fangyuan Zhang;Lingling Fan;Sen Chen;Miaoying Cai;Sihan Xu;Lida Zhao","doi":"10.1109/TSE.2024.3454960","DOIUrl":null,"url":null,"abstract":"Developers usually use third-party libraries (TPLs) to facilitate the development of their projects to avoid reinventing the wheels, however, the vulnerable TPLs indeed cause severe security threats. The majority of existing research only considered whether projects used vulnerable TPLs but neglected whether the vulnerable code of the TPLs was indeed used by the projects, which inevitably results in false positives and further requires additional patching efforts and maintenance costs (e.g., dependency conflict issues after version upgrades). To mitigate such a problem, we propose \n<monospace>VAScanner</monospace>\n, which can effectively identify vulnerable root methods causing vulnerabilities in TPLs and further identify all vulnerable APIs of TPLs used by Java projects. Specifically, we first collect the initial patch methods from the patch commits and extract accurate patch methods by employing a patch-unrelated sifting mechanism, then we further identify the vulnerable root methods for each vulnerability by employing an augmentation mechanism. Based on them, we leverage backward call graph analysis to identify all vulnerable APIs for each vulnerable TPL version and construct a database consisting of 90,749 (2,410,779 with library versions) vulnerable APIswith 1.45% false positive proportion with a 95% confidence interval (CI) of [1.31%, 1.59%] from 362 TPLs with 14,775 versions. The database serves as a reference database to help developers detect vulnerable APIs of TPLs used by projects. Our experiments show \n<monospace>VAScanner</monospace>\n eliminates 5.78% false positives and 2.16% false negatives owing to the proposed sifting and augmentation mechanisms. Besides, it outperforms the state-of-the-art method-level vulnerability detection tool in analyzing direct dependencies, Eclipse Steady, achieving more effective detection of vulnerable APIs. Furthermore, to investigate the real impact of vulnerabilities on real open-source projects, we exploit \n<monospace>VAScanner</monospace>\n to conduct a large-scale analysis on 3,147 projects that depend on vulnerable TPLs, and find only 21.51% of projects (with 1.83% false positive proportion and a 95% CI of [0.71%, 4.61%]) were threatened through vulnerable APIs, demonstrating that \n<monospace>VAScanner</monospace>\n can potentially reduce false positives significantly.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2906-2920"},"PeriodicalIF":6.5000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10666791/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Developers usually use third-party libraries (TPLs) to facilitate the development of their projects to avoid reinventing the wheels, however, the vulnerable TPLs indeed cause severe security threats. The majority of existing research only considered whether projects used vulnerable TPLs but neglected whether the vulnerable code of the TPLs was indeed used by the projects, which inevitably results in false positives and further requires additional patching efforts and maintenance costs (e.g., dependency conflict issues after version upgrades). To mitigate such a problem, we propose
VAScanner
, which can effectively identify vulnerable root methods causing vulnerabilities in TPLs and further identify all vulnerable APIs of TPLs used by Java projects. Specifically, we first collect the initial patch methods from the patch commits and extract accurate patch methods by employing a patch-unrelated sifting mechanism, then we further identify the vulnerable root methods for each vulnerability by employing an augmentation mechanism. Based on them, we leverage backward call graph analysis to identify all vulnerable APIs for each vulnerable TPL version and construct a database consisting of 90,749 (2,410,779 with library versions) vulnerable APIswith 1.45% false positive proportion with a 95% confidence interval (CI) of [1.31%, 1.59%] from 362 TPLs with 14,775 versions. The database serves as a reference database to help developers detect vulnerable APIs of TPLs used by projects. Our experiments show
VAScanner
eliminates 5.78% false positives and 2.16% false negatives owing to the proposed sifting and augmentation mechanisms. Besides, it outperforms the state-of-the-art method-level vulnerability detection tool in analyzing direct dependencies, Eclipse Steady, achieving more effective detection of vulnerable APIs. Furthermore, to investigate the real impact of vulnerabilities on real open-source projects, we exploit
VAScanner
to conduct a large-scale analysis on 3,147 projects that depend on vulnerable TPLs, and find only 21.51% of projects (with 1.83% false positive proportion and a 95% CI of [0.71%, 4.61%]) were threatened through vulnerable APIs, demonstrating that
VAScanner
can potentially reduce false positives significantly.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.