{"title":"Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry Environments","authors":"Xinyi Zheng, Chen Wei, Shenao Wang, Yanjie Zhao, Peiming Gao, Yuanchao Zhang, Kailong Wang, Haoyu Wang","doi":"arxiv-2409.09356","DOIUrl":null,"url":null,"abstract":"The exponential growth of open-source package ecosystems, particularly NPM\nand PyPI, has led to an alarming increase in software supply chain poisoning\nattacks. Existing static analysis methods struggle with high false positive\nrates and are easily thwarted by obfuscation and dynamic code execution\ntechniques. While dynamic analysis approaches offer improvements, they often\nsuffer from capturing non-package behaviors and employing simplistic testing\nstrategies that fail to trigger sophisticated malicious behaviors. To address\nthese challenges, we present OSCAR, a robust dynamic code poisoning detection\npipeline for NPM and PyPI ecosystems. OSCAR fully executes packages in a\nsandbox environment, employs fuzz testing on exported functions and classes,\nand implements aspect-based behavior monitoring with tailored API hook points.\nWe evaluate OSCAR against six existing tools using a comprehensive benchmark\ndataset of real-world malicious and benign packages. OSCAR achieves an F1 score\nof 0.95 in NPM and 0.91 in PyPI, confirming that OSCAR is as effective as the\ncurrent state-of-the-art technologies. Furthermore, for benign packages\nexhibiting characteristics typical of malicious packages, OSCAR reduces the\nfalse positive rate by an average of 32.06% in NPM (from 34.63% to 2.57%) and\n39.87% in PyPI (from 41.10% to 1.23%), compared to other tools, significantly\nreducing the workload of manual reviews in real-world deployments. In\ncooperation with Ant Group, a leading financial technology company, we have\ndeployed OSCAR on its NPM and PyPI mirrors since January 2023, identifying\n10,404 malicious NPM packages and 1,235 malicious PyPI packages over 18 months.\nThis work not only bridges the gap between academic research and industrial\napplication in code poisoning detection but also provides a robust and\npractical solution that has been thoroughly tested in a real-world industrial\nsetting.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The exponential growth of open-source package ecosystems, particularly NPM
and PyPI, has led to an alarming increase in software supply chain poisoning
attacks. Existing static analysis methods struggle with high false positive
rates and are easily thwarted by obfuscation and dynamic code execution
techniques. While dynamic analysis approaches offer improvements, they often
suffer from capturing non-package behaviors and employing simplistic testing
strategies that fail to trigger sophisticated malicious behaviors. To address
these challenges, we present OSCAR, a robust dynamic code poisoning detection
pipeline for NPM and PyPI ecosystems. OSCAR fully executes packages in a
sandbox environment, employs fuzz testing on exported functions and classes,
and implements aspect-based behavior monitoring with tailored API hook points.
We evaluate OSCAR against six existing tools using a comprehensive benchmark
dataset of real-world malicious and benign packages. OSCAR achieves an F1 score
of 0.95 in NPM and 0.91 in PyPI, confirming that OSCAR is as effective as the
current state-of-the-art technologies. Furthermore, for benign packages
exhibiting characteristics typical of malicious packages, OSCAR reduces the
false positive rate by an average of 32.06% in NPM (from 34.63% to 2.57%) and
39.87% in PyPI (from 41.10% to 1.23%), compared to other tools, significantly
reducing the workload of manual reviews in real-world deployments. In
cooperation with Ant Group, a leading financial technology company, we have
deployed OSCAR on its NPM and PyPI mirrors since January 2023, identifying
10,404 malicious NPM packages and 1,235 malicious PyPI packages over 18 months.
This work not only bridges the gap between academic research and industrial
application in code poisoning detection but also provides a robust and
practical solution that has been thoroughly tested in a real-world industrial
setting.