Lukas Pirch, Alexander Warnecke, Christian Wressnegger, Konrad Rieck
{"title":"TagVet","authors":"Lukas Pirch, Alexander Warnecke, Christian Wressnegger, Konrad Rieck","doi":"10.1145/3447852.3458719","DOIUrl":null,"url":null,"abstract":"When managing large malware collections, it is common practice to use short tags for grouping and organizing samples. For example, collected malware is often tagged according to its origin, family, functionality, or clustering. While these simple tags are essential for keeping abreast of the rapid malware development, they can become disconnected from the actual behavior of the samples and, in the worst case, mislead the analyst. In particular, if tags are automatically assigned, it is often unclear whether they indeed align with the malware functionality. In this paper, we propose a method for vetting tags in malware collections. Our method builds on recent techniques of explainable machine learning, which enable us to automatically link tags to behavioral patterns observed during dynamic analysis. To this end, we train a neural network to classify different tags and trace back its decision to individual system calls and arguments. We empirically evaluate our method on tags for malware functionality, families, and clusterings. Our results demonstrate the utility of this approach and pinpoint interesting relations of malware tags in practice.","PeriodicalId":329372,"journal":{"name":"Proceedings of the 14th European Workshop on Systems Security","volume":"161 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th European Workshop on Systems Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447852.3458719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
When managing large malware collections, it is common practice to use short tags for grouping and organizing samples. For example, collected malware is often tagged according to its origin, family, functionality, or clustering. While these simple tags are essential for keeping abreast of the rapid malware development, they can become disconnected from the actual behavior of the samples and, in the worst case, mislead the analyst. In particular, if tags are automatically assigned, it is often unclear whether they indeed align with the malware functionality. In this paper, we propose a method for vetting tags in malware collections. Our method builds on recent techniques of explainable machine learning, which enable us to automatically link tags to behavioral patterns observed during dynamic analysis. To this end, we train a neural network to classify different tags and trace back its decision to individual system calls and arguments. We empirically evaluate our method on tags for malware functionality, families, and clusterings. Our results demonstrate the utility of this approach and pinpoint interesting relations of malware tags in practice.