GLAMP: Generative Learning for Adversarially-Robust Malware Prediction

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing Pub Date : 2025-07-10 DOI:10.1109/TETC.2025.3583872

Saurabh Kumar;Cristian Molinaro;Lirika Sola;V. S. Subrahmanian

{"title":"GLAMP: Generative Learning for Adversarially-Robust Malware Prediction","authors":"Saurabh Kumar;Cristian Molinaro;Lirika Sola;V. S. Subrahmanian","doi":"10.1109/TETC.2025.3583872","DOIUrl":null,"url":null,"abstract":"We propose a novel <i>Generative Malware Defense</i> strategy. When an antivirus company detects a malware sample <inline-formula><tex-math>$m$</tex-math></inline-formula>, they should: (i) generate a set <inline-formula><tex-math>${Var}(m)$</tex-math></inline-formula> of several variants of <inline-formula><tex-math>$m$</tex-math></inline-formula> and then (ii) train their malware classifiers on their usual training set augmented with <inline-formula><tex-math>${Var}(m)$</tex-math></inline-formula>. We believe this leads to a more proactive defense by making the classifiers more robust to future malware developed by the attacker. We formally define the malware generation problem as a non-traditional optimization problem. Our novel GLAMP (Generative Learning for Adversarially-robust Malware Prediction) framework analyzes the complexity of the malware generation problem and includes novel malware variant generation algorithms for (i) that leverage the complexity results. Our experiments show that a sufficiently large percentage of samples generated by GLAMP are able to evade both commercial anti-virus and machine learning classifiers with evasion rates up to 83.81% and 50.54%, respectively. GLAMP then proposes an adversarial training model as well. Our experiments show that GLAMP generates running malware that can evade 11 white boxclassifiers and 4 commercial (i.e., black box) detectors. Our experiments show GLAMP’s best adversarial training engine improves the recall by 16.1% and the F1 score by 2.4%-5.4% depending on the test set used.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1299-1315"},"PeriodicalIF":5.4000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11075921/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

We propose a novel Generative Malware Defense strategy. When an antivirus company detects a malware sample

$m$

, they should: (i) generate a set

${Var}(m)$

of several variants of

$m$

and then (ii) train their malware classifiers on their usual training set augmented with

${Var}(m)$

. We believe this leads to a more proactive defense by making the classifiers more robust to future malware developed by the attacker. We formally define the malware generation problem as a non-traditional optimization problem. Our novel GLAMP (Generative Learning for Adversarially-robust Malware Prediction) framework analyzes the complexity of the malware generation problem and includes novel malware variant generation algorithms for (i) that leverage the complexity results. Our experiments show that a sufficiently large percentage of samples generated by GLAMP are able to evade both commercial anti-virus and machine learning classifiers with evasion rates up to 83.81% and 50.54%, respectively. GLAMP then proposes an adversarial training model as well. Our experiments show that GLAMP generates running malware that can evade 11 white boxclassifiers and 4 commercial (i.e., black box) detectors. Our experiments show GLAMP’s best adversarial training engine improves the recall by 16.1% and the F1 score by 2.4%-5.4% depending on the test set used.

查看原文本刊更多论文

生成学习用于对抗鲁棒性恶意软件预测

我们提出了一种新的生成式恶意软件防御策略。当反病毒公司检测到恶意软件样本$m$时，他们应该：(i)由$m$的几个变体生成一个集${Var}(m)$，然后（ii）在用${Var}(m)$增强的常规训练集上训练他们的恶意软件分类器。我们相信，通过使分类器对攻击者开发的未来恶意软件更加健壮，这将导致更主动的防御。我们将恶意软件生成问题正式定义为一个非传统的优化问题。我们新颖的GLAMP（生成学习对抗鲁棒恶意软件预测）框架分析了恶意软件生成问题的复杂性，并包括利用复杂性结果的新型恶意软件变体生成算法(i)。我们的实验表明，GLAMP生成的足够大百分比的样本能够逃避商业反病毒和机器学习分类器，逃避率分别高达83.81%和50.54%。然后，GLAMP也提出了一个对抗训练模型。我们的实验表明，GLAMP生成的运行恶意软件可以逃避11个白盒分类器和4个商业（即黑匣子）检测器。我们的实验表明，根据所使用的测试集，GLAMP最好的对抗性训练引擎将召回率提高了16.1%，F1分数提高了2.4%-5.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computing Computer Science-Computer Science (miscellaneous)

CiteScore

12.10

自引率

5.10%

发文量

113

期刊介绍： IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.