Sreenitha Kasarapu, Sanket Shukla, Rakibul Hassan, Avesta Sasan, H. Homayoun, Sai Manoj Pudukotai Dinakarrao
{"title":"CAD-FSL: Code-Aware Data Generation based Few-Shot Learning for Efficient Malware Detection","authors":"Sreenitha Kasarapu, Sanket Shukla, Rakibul Hassan, Avesta Sasan, H. Homayoun, Sai Manoj Pudukotai Dinakarrao","doi":"10.1145/3526241.3530825","DOIUrl":null,"url":null,"abstract":"One of the pivotal security threats for embedded computing systems is malicious softwarea.k.a malware. With efficiency and efficacy, Machine Learning (ML) has been widely adopted for malware detection in recent times. Despite being efficient, the existing techniques require updating the ML model frequently with newer benign and malware samples for training and modeling an efficient malware detector. Furthermore, such constraints limit the detection of emerging malware samples due to the lack of sufficient malware samples required for efficient training. To address such concerns, we introduce a code-aware data generation-based few-shot learning technique. CAD-FSL generates multiple mutated samples of the limitedly seen malware for efficient malware detection. Loss minimization ensures that the generated samples closely mimic the limitedly seen malware, restore malware functionality and mitigate the impractical samples. Such developed synthetic malware is incorporated into the training set to formulate the model that can efficiently detect the emerging malware despite having limited (few-shot) exposure. The experimental results demonstrate that with the proposed \"Code-Aware Data Generation\" technique, we detect malware with 90% accuracy, which is approximately 9% higher while training classifiers with only limitedly available training data.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Great Lakes Symposium on VLSI 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3526241.3530825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
One of the pivotal security threats for embedded computing systems is malicious softwarea.k.a malware. With efficiency and efficacy, Machine Learning (ML) has been widely adopted for malware detection in recent times. Despite being efficient, the existing techniques require updating the ML model frequently with newer benign and malware samples for training and modeling an efficient malware detector. Furthermore, such constraints limit the detection of emerging malware samples due to the lack of sufficient malware samples required for efficient training. To address such concerns, we introduce a code-aware data generation-based few-shot learning technique. CAD-FSL generates multiple mutated samples of the limitedly seen malware for efficient malware detection. Loss minimization ensures that the generated samples closely mimic the limitedly seen malware, restore malware functionality and mitigate the impractical samples. Such developed synthetic malware is incorporated into the training set to formulate the model that can efficiently detect the emerging malware despite having limited (few-shot) exposure. The experimental results demonstrate that with the proposed "Code-Aware Data Generation" technique, we detect malware with 90% accuracy, which is approximately 9% higher while training classifiers with only limitedly available training data.