{"title":"ADTDroid:利用API描述和基于TCP的主动学习来检测Android恶意软件","authors":"Zhen Liu , Ruoyu Wang , Wenbin Zhang","doi":"10.1016/j.infsof.2025.107930","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Extensive research has been conducted on neural network-based Android malware detection models to safeguard the Android software ecosystem. However, the efficacy of detection models may decline over time due to the continuous evolution of malicious behaviors, a phenomenon referred to as the model aging problem.</div></div><div><h3>Objective:</h3><div>To tackle this problem, existing researches primarily focus on API semantic feature learning and active learning. However, a major challenge in feature learning is the continuous updating of APIs. Additionally, the over-confidence problem in neural networks exacerbates the challenge of selecting uncertain samples during active learning. To handle these challenges, this paper proposes a novel android malware detection method called ADTDroid. It aims to enhance the performance of malware detection model against the ongoing API updating and malware evolution.</div></div><div><h3>Method:</h3><div>In this paper, we present a sensitive event graph based feature extraction approach that prioritizes suspicious APIs. To derive API embeddings for feature vector extraction, we propose learning these embeddings directly from API descriptions provided in official Android development documentation. This method facilitates the immediate acquisition of embeddings for updated APIs from the documentation. Furthermore, we propose a True Class Probability(TCP)-based confidence score to identify uncertain samples for model retraining. These samples exhibit genuine uncertainty, thereby enhancing the model’s adaptability to evolving data.</div></div><div><h3>Results:</h3><div>Through extensive experimentation on large-scale real-world datasets covering the period from 2013 to 2022, our method achieves significant improvements in the F-score of malware detection. Compared to existing active learning-based approaches, our method achieves relative improvements of approximately 10% over APIGraph and 8.1% over contrastive autoencoder techniques.</div></div><div><h3>Conclusion:</h3><div>ADTDroid can enhance the performance of feature extraction in cases of model aging. It can also improve the selection of uncertain samples to adapt the malware detection model to new data.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107930"},"PeriodicalIF":4.3000,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ADTDroid: Leveraging API description and TCP based active learning for Android malware detection\",\"authors\":\"Zhen Liu , Ruoyu Wang , Wenbin Zhang\",\"doi\":\"10.1016/j.infsof.2025.107930\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><div>Extensive research has been conducted on neural network-based Android malware detection models to safeguard the Android software ecosystem. However, the efficacy of detection models may decline over time due to the continuous evolution of malicious behaviors, a phenomenon referred to as the model aging problem.</div></div><div><h3>Objective:</h3><div>To tackle this problem, existing researches primarily focus on API semantic feature learning and active learning. However, a major challenge in feature learning is the continuous updating of APIs. Additionally, the over-confidence problem in neural networks exacerbates the challenge of selecting uncertain samples during active learning. To handle these challenges, this paper proposes a novel android malware detection method called ADTDroid. It aims to enhance the performance of malware detection model against the ongoing API updating and malware evolution.</div></div><div><h3>Method:</h3><div>In this paper, we present a sensitive event graph based feature extraction approach that prioritizes suspicious APIs. To derive API embeddings for feature vector extraction, we propose learning these embeddings directly from API descriptions provided in official Android development documentation. This method facilitates the immediate acquisition of embeddings for updated APIs from the documentation. Furthermore, we propose a True Class Probability(TCP)-based confidence score to identify uncertain samples for model retraining. These samples exhibit genuine uncertainty, thereby enhancing the model’s adaptability to evolving data.</div></div><div><h3>Results:</h3><div>Through extensive experimentation on large-scale real-world datasets covering the period from 2013 to 2022, our method achieves significant improvements in the F-score of malware detection. Compared to existing active learning-based approaches, our method achieves relative improvements of approximately 10% over APIGraph and 8.1% over contrastive autoencoder techniques.</div></div><div><h3>Conclusion:</h3><div>ADTDroid can enhance the performance of feature extraction in cases of model aging. It can also improve the selection of uncertain samples to adapt the malware detection model to new data.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"189 \",\"pages\":\"Article 107930\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925002691\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002691","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
ADTDroid: Leveraging API description and TCP based active learning for Android malware detection
Context:
Extensive research has been conducted on neural network-based Android malware detection models to safeguard the Android software ecosystem. However, the efficacy of detection models may decline over time due to the continuous evolution of malicious behaviors, a phenomenon referred to as the model aging problem.
Objective:
To tackle this problem, existing researches primarily focus on API semantic feature learning and active learning. However, a major challenge in feature learning is the continuous updating of APIs. Additionally, the over-confidence problem in neural networks exacerbates the challenge of selecting uncertain samples during active learning. To handle these challenges, this paper proposes a novel android malware detection method called ADTDroid. It aims to enhance the performance of malware detection model against the ongoing API updating and malware evolution.
Method:
In this paper, we present a sensitive event graph based feature extraction approach that prioritizes suspicious APIs. To derive API embeddings for feature vector extraction, we propose learning these embeddings directly from API descriptions provided in official Android development documentation. This method facilitates the immediate acquisition of embeddings for updated APIs from the documentation. Furthermore, we propose a True Class Probability(TCP)-based confidence score to identify uncertain samples for model retraining. These samples exhibit genuine uncertainty, thereby enhancing the model’s adaptability to evolving data.
Results:
Through extensive experimentation on large-scale real-world datasets covering the period from 2013 to 2022, our method achieves significant improvements in the F-score of malware detection. Compared to existing active learning-based approaches, our method achieves relative improvements of approximately 10% over APIGraph and 8.1% over contrastive autoencoder techniques.
Conclusion:
ADTDroid can enhance the performance of feature extraction in cases of model aging. It can also improve the selection of uncertain samples to adapt the malware detection model to new data.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.