{"title":"What the Fork? Finding Hidden Code Clones in npm","authors":"Elizabeth Wyss, Lorenzo De Carli, Drew Davidson","doi":"10.1145/3510003.3510168","DOIUrl":null,"url":null,"abstract":"This work presents findings and mitigations on an under-studied issue, which we term shrinkwrapped clones, that is endemic to the npm software package ecosystem. A shrink-wrapped clone is a package which duplicates, or near-duplicates, the code of another package without any indication or refer-ence to the original package. This phenomenon represents a challenge to the hygiene of package ecosystems, as a clone package may siphon interest from the package being cloned, or create hidden duplicates of vulnerable, insecure code which can fly under the radar of audit processes. Motivated by these considerations, we propose UNWRAP-PER, a mechanism to programmatically detect shrinkwrapped clones and match them to their source package. UNWRAP-PER uses a package difference metric based on directory tree similarity, augmented with a prefilter which quickly weeds out packages unlikely to be clones of a target. Overall, our prototype can compare a given package within the entire npm ecosystem (1,716,061 packages with 20,190,452 differ-ent versions) in 72.85 seconds, and it is thus practical for live deployment. Using our tool, we performed an analysis of a subset of npm packages, which resulted in finding up to 6,292 previously unknown shrinkwrapped clones, of which up to 207 carried vulnerabilities from the original package that had already been fixed in the original package. None of such vulnerabilities were discoverable via the standard npm audit process.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510003.3510168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
This work presents findings and mitigations on an under-studied issue, which we term shrinkwrapped clones, that is endemic to the npm software package ecosystem. A shrink-wrapped clone is a package which duplicates, or near-duplicates, the code of another package without any indication or refer-ence to the original package. This phenomenon represents a challenge to the hygiene of package ecosystems, as a clone package may siphon interest from the package being cloned, or create hidden duplicates of vulnerable, insecure code which can fly under the radar of audit processes. Motivated by these considerations, we propose UNWRAP-PER, a mechanism to programmatically detect shrinkwrapped clones and match them to their source package. UNWRAP-PER uses a package difference metric based on directory tree similarity, augmented with a prefilter which quickly weeds out packages unlikely to be clones of a target. Overall, our prototype can compare a given package within the entire npm ecosystem (1,716,061 packages with 20,190,452 differ-ent versions) in 72.85 seconds, and it is thus practical for live deployment. Using our tool, we performed an analysis of a subset of npm packages, which resulted in finding up to 6,292 previously unknown shrinkwrapped clones, of which up to 207 carried vulnerabilities from the original package that had already been fixed in the original package. None of such vulnerabilities were discoverable via the standard npm audit process.