Saurabh Kumar;Cristian Molinaro;Lirika Sola;V. S. Subrahmanian
{"title":"生成学习用于对抗鲁棒性恶意软件预测","authors":"Saurabh Kumar;Cristian Molinaro;Lirika Sola;V. S. Subrahmanian","doi":"10.1109/TETC.2025.3583872","DOIUrl":null,"url":null,"abstract":"We propose a novel <i>Generative Malware Defense</i> strategy. When an antivirus company detects a malware sample <inline-formula><tex-math>$m$</tex-math></inline-formula>, they should: (i) generate a set <inline-formula><tex-math>${Var}(m)$</tex-math></inline-formula> of several variants of <inline-formula><tex-math>$m$</tex-math></inline-formula> and then (ii) train their malware classifiers on their usual training set augmented with <inline-formula><tex-math>${Var}(m)$</tex-math></inline-formula>. We believe this leads to a more proactive defense by making the classifiers more robust to future malware developed by the attacker. We formally define the malware generation problem as a non-traditional optimization problem. Our novel GLAMP (Generative Learning for Adversarially-robust Malware Prediction) framework analyzes the complexity of the malware generation problem and includes novel malware variant generation algorithms for (i) that leverage the complexity results. Our experiments show that a sufficiently large percentage of samples generated by GLAMP are able to evade both commercial anti-virus and machine learning classifiers with evasion rates up to 83.81% and 50.54%, respectively. GLAMP then proposes an adversarial training model as well. Our experiments show that GLAMP generates running malware that can evade 11 white boxclassifiers and 4 commercial (i.e., black box) detectors. Our experiments show GLAMP’s best adversarial training engine improves the recall by 16.1% and the F1 score by 2.4%-5.4% depending on the test set used.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1299-1315"},"PeriodicalIF":5.4000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GLAMP: Generative Learning for Adversarially-Robust Malware Prediction\",\"authors\":\"Saurabh Kumar;Cristian Molinaro;Lirika Sola;V. S. Subrahmanian\",\"doi\":\"10.1109/TETC.2025.3583872\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a novel <i>Generative Malware Defense</i> strategy. When an antivirus company detects a malware sample <inline-formula><tex-math>$m$</tex-math></inline-formula>, they should: (i) generate a set <inline-formula><tex-math>${Var}(m)$</tex-math></inline-formula> of several variants of <inline-formula><tex-math>$m$</tex-math></inline-formula> and then (ii) train their malware classifiers on their usual training set augmented with <inline-formula><tex-math>${Var}(m)$</tex-math></inline-formula>. We believe this leads to a more proactive defense by making the classifiers more robust to future malware developed by the attacker. We formally define the malware generation problem as a non-traditional optimization problem. Our novel GLAMP (Generative Learning for Adversarially-robust Malware Prediction) framework analyzes the complexity of the malware generation problem and includes novel malware variant generation algorithms for (i) that leverage the complexity results. Our experiments show that a sufficiently large percentage of samples generated by GLAMP are able to evade both commercial anti-virus and machine learning classifiers with evasion rates up to 83.81% and 50.54%, respectively. GLAMP then proposes an adversarial training model as well. Our experiments show that GLAMP generates running malware that can evade 11 white boxclassifiers and 4 commercial (i.e., black box) detectors. Our experiments show GLAMP’s best adversarial training engine improves the recall by 16.1% and the F1 score by 2.4%-5.4% depending on the test set used.\",\"PeriodicalId\":13156,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computing\",\"volume\":\"13 3\",\"pages\":\"1299-1315\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11075921/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11075921/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
GLAMP: Generative Learning for Adversarially-Robust Malware Prediction
We propose a novel Generative Malware Defense strategy. When an antivirus company detects a malware sample $m$, they should: (i) generate a set ${Var}(m)$ of several variants of $m$ and then (ii) train their malware classifiers on their usual training set augmented with ${Var}(m)$. We believe this leads to a more proactive defense by making the classifiers more robust to future malware developed by the attacker. We formally define the malware generation problem as a non-traditional optimization problem. Our novel GLAMP (Generative Learning for Adversarially-robust Malware Prediction) framework analyzes the complexity of the malware generation problem and includes novel malware variant generation algorithms for (i) that leverage the complexity results. Our experiments show that a sufficiently large percentage of samples generated by GLAMP are able to evade both commercial anti-virus and machine learning classifiers with evasion rates up to 83.81% and 50.54%, respectively. GLAMP then proposes an adversarial training model as well. Our experiments show that GLAMP generates running malware that can evade 11 white boxclassifiers and 4 commercial (i.e., black box) detectors. Our experiments show GLAMP’s best adversarial training engine improves the recall by 16.1% and the F1 score by 2.4%-5.4% depending on the test set used.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.