Huijuan Zhu;Xilong Chen;Liangmin Wang;Zhicheng Xu;Victor S. Sheng
{"title":"用于恶意软件检测的动态分析驱动解释框架","authors":"Huijuan Zhu;Xilong Chen;Liangmin Wang;Zhicheng Xu;Victor S. Sheng","doi":"10.1109/TKDE.2024.3436891","DOIUrl":null,"url":null,"abstract":"Deep learning has been widely adopted in Android malicious software (malware) detection. However, poor explanation in deep learning-based detection models severely undermines user trusts and poses a significant obstacle to their practical promotion in critical security domains. Some studies strive to uncover the rationale behind a model's decision. Unfortunately, these efforts are often hindered by the limitations of feature extraction methods, such as primarily relying on static analysis to derive separate and approximate behavioral descriptions of applications (apps). As a result, establishing a reliable interpretation for deep learning-based malware detection models remains an open issue. In this work, we propose a novel framework XDeepMal to interpret deep learning-based malware detection models. Specifically, in XDeepMal, we formulate a dynamic analysis tool XTracer\n<sup>+</sup>\n to capture runtime behaviors of apps and automatically generate their continuous behavior trajectories. Then, we propose a novel interpreter to pinpoint certainty behavior fragments that are crucial for deep learning models to make their decisions. This approach regards the identification of the most critical fragments as an optimization problem and leverages heuristic algorithms for implementation. We conduct extensive experiments on a real-world dataset to investigate the effectiveness and reliability of XDeepMal. These experiments cover intuitive case studies (malware family and individual app) and in-depth quantitative analysis. Additionally, we evaluate its coverage and efficiency. Our experimental results demonstrate that XDeepMal is capable of generating convincing interpretations for deep learning (e.g., Transformer) based models within feasible inference time, which greatly benefits security analysts in accurately comprehending why an app is identified as malware by deep learning-based detection models.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7483-7496"},"PeriodicalIF":8.9000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Dynamic Analysis-Powered Explanation Framework for Malware Detection\",\"authors\":\"Huijuan Zhu;Xilong Chen;Liangmin Wang;Zhicheng Xu;Victor S. Sheng\",\"doi\":\"10.1109/TKDE.2024.3436891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning has been widely adopted in Android malicious software (malware) detection. However, poor explanation in deep learning-based detection models severely undermines user trusts and poses a significant obstacle to their practical promotion in critical security domains. Some studies strive to uncover the rationale behind a model's decision. Unfortunately, these efforts are often hindered by the limitations of feature extraction methods, such as primarily relying on static analysis to derive separate and approximate behavioral descriptions of applications (apps). As a result, establishing a reliable interpretation for deep learning-based malware detection models remains an open issue. In this work, we propose a novel framework XDeepMal to interpret deep learning-based malware detection models. Specifically, in XDeepMal, we formulate a dynamic analysis tool XTracer\\n<sup>+</sup>\\n to capture runtime behaviors of apps and automatically generate their continuous behavior trajectories. Then, we propose a novel interpreter to pinpoint certainty behavior fragments that are crucial for deep learning models to make their decisions. This approach regards the identification of the most critical fragments as an optimization problem and leverages heuristic algorithms for implementation. We conduct extensive experiments on a real-world dataset to investigate the effectiveness and reliability of XDeepMal. These experiments cover intuitive case studies (malware family and individual app) and in-depth quantitative analysis. Additionally, we evaluate its coverage and efficiency. Our experimental results demonstrate that XDeepMal is capable of generating convincing interpretations for deep learning (e.g., Transformer) based models within feasible inference time, which greatly benefits security analysts in accurately comprehending why an app is identified as malware by deep learning-based detection models.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"36 12\",\"pages\":\"7483-7496\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10632781/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10632781/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A Dynamic Analysis-Powered Explanation Framework for Malware Detection
Deep learning has been widely adopted in Android malicious software (malware) detection. However, poor explanation in deep learning-based detection models severely undermines user trusts and poses a significant obstacle to their practical promotion in critical security domains. Some studies strive to uncover the rationale behind a model's decision. Unfortunately, these efforts are often hindered by the limitations of feature extraction methods, such as primarily relying on static analysis to derive separate and approximate behavioral descriptions of applications (apps). As a result, establishing a reliable interpretation for deep learning-based malware detection models remains an open issue. In this work, we propose a novel framework XDeepMal to interpret deep learning-based malware detection models. Specifically, in XDeepMal, we formulate a dynamic analysis tool XTracer
+
to capture runtime behaviors of apps and automatically generate their continuous behavior trajectories. Then, we propose a novel interpreter to pinpoint certainty behavior fragments that are crucial for deep learning models to make their decisions. This approach regards the identification of the most critical fragments as an optimization problem and leverages heuristic algorithms for implementation. We conduct extensive experiments on a real-world dataset to investigate the effectiveness and reliability of XDeepMal. These experiments cover intuitive case studies (malware family and individual app) and in-depth quantitative analysis. Additionally, we evaluate its coverage and efficiency. Our experimental results demonstrate that XDeepMal is capable of generating convincing interpretations for deep learning (e.g., Transformer) based models within feasible inference time, which greatly benefits security analysts in accurately comprehending why an app is identified as malware by deep learning-based detection models.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.