{"title":"Wolf in Sheep’s Clothing: Shearing the Camouflage of Malicious Java Components in Maven","authors":"Yutong Zeng;Cheng Huang;Jiaxuan Han;Jianguo Zhao;Nannan Wang;Genpei Liang;Shuyi Jiang","doi":"10.1109/TSE.2025.3599732","DOIUrl":null,"url":null,"abstract":"In recent years, software supply chain attacks have become increasingly prevalent, prompting considerable research into detecting malicious packages within relevant repositories. With the popularity bolstered by the widespread adoption of open-source practices, Java become one of the preferred languages among modern developers. However, the issue of malware detection in Java components remains unresolved. Most prior approaches suffer from insufficient code coverage and coarse-grained representation, making them unsuitable for Java components. In this paper, we propose an innovative solution called <sc>Shear</small> tailored for detecting malicious Java components. <sc>Shear</small> firstly analyzes all methods in the component and locates potential malicious code snippets based on sensitive calls, as slice-level analysis provides a better understanding of the specific malicious activities. Secondly, statements depending on sensitive call sites are extracted and embedded into vectors for further detection instead of function-level representation which is coarse-grained facing the dynamic features in Java. The corresponding experimental results show that <sc>Shear</small> effectively identifies the malicious semantics hidden in the code slices by leveraging the neural network model, outperforming currently available tools to a great extent. Through real-world validation, <sc>Shear</small> detected 51 components with malicious characteristics out of 68,273, demonstrating its practical feasibility. This study introduces the first Java malicious component detection method suitable for real-world scenarios, carrying considerable practical significance in bolstering defenses within the software supply chain.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 10","pages":"2847-2863"},"PeriodicalIF":5.6000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11129930/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, software supply chain attacks have become increasingly prevalent, prompting considerable research into detecting malicious packages within relevant repositories. With the popularity bolstered by the widespread adoption of open-source practices, Java become one of the preferred languages among modern developers. However, the issue of malware detection in Java components remains unresolved. Most prior approaches suffer from insufficient code coverage and coarse-grained representation, making them unsuitable for Java components. In this paper, we propose an innovative solution called Shear tailored for detecting malicious Java components. Shear firstly analyzes all methods in the component and locates potential malicious code snippets based on sensitive calls, as slice-level analysis provides a better understanding of the specific malicious activities. Secondly, statements depending on sensitive call sites are extracted and embedded into vectors for further detection instead of function-level representation which is coarse-grained facing the dynamic features in Java. The corresponding experimental results show that Shear effectively identifies the malicious semantics hidden in the code slices by leveraging the neural network model, outperforming currently available tools to a great extent. Through real-world validation, Shear detected 51 components with malicious characteristics out of 68,273, demonstrating its practical feasibility. This study introduces the first Java malicious component detection method suitable for real-world scenarios, carrying considerable practical significance in bolstering defenses within the software supply chain.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.