PyRHOH:一个元学习分析框架,用于确定编译对恶意JavaScript识别的影响

IF 4.9
Eli Fulkerson , Eric Yocam , Varghese Vaidyan , Mahesh Kamepalli , Yong Wang , Gurcan Comert
{"title":"PyRHOH:一个元学习分析框架,用于确定编译对恶意JavaScript识别的影响","authors":"Eli Fulkerson ,&nbsp;Eric Yocam ,&nbsp;Varghese Vaidyan ,&nbsp;Mahesh Kamepalli ,&nbsp;Yong Wang ,&nbsp;Gurcan Comert","doi":"10.1016/j.mlwa.2025.100724","DOIUrl":null,"url":null,"abstract":"<div><div>Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"21 ","pages":"Article 100724"},"PeriodicalIF":4.9000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification\",\"authors\":\"Eli Fulkerson ,&nbsp;Eric Yocam ,&nbsp;Varghese Vaidyan ,&nbsp;Mahesh Kamepalli ,&nbsp;Yong Wang ,&nbsp;Gurcan Comert\",\"doi\":\"10.1016/j.mlwa.2025.100724\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"21 \",\"pages\":\"Article 100724\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025001070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025001070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

恶意JavaScript的自动识别是现代恶意软件分析中的一个核心问题。代码混淆是逃避检测的常用策略。这种混淆阻碍了人工和自动检测方法,包括神经网络技术。为了使这些方法能够有效地对恶意软件进行分类,减少混淆的影响以及优化神经网络的配置和结构以适应任务。为了克服这些挑战,我们提出了一个新的框架:“PyRHOH”(“Python可重复超参数优化工具”),这是一个实现贝叶斯优化的元学习框架。使用贝叶斯方法对候选超参数的自动探索和最大化增加了神经网络超参数选择的结构和严谨性,为实现的设计提供了最优的保证。在这项研究中,我们使用PyRHOH框架来确定用于区分恶意和良性JavaScript样本的最佳递归神经网络架构。然后,我们使用这些神经网络来测量通过谷歌的V8 JavaScript编译器将原始JavaScript样本编译成字节码对分类准确性的影响程度。对野外样本进行分类后,编译将检出率从76.88%提高到95.84%。在均匀混淆的样本中,编译将检出率从平均76.76%提高到平均91.24%。这表明将JavaScript预处理为编译后的字节码对神经网络分类有明显的积极影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification
Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Machine learning with applications
Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications
自引率
0.00%
发文量
0
审稿时长
98 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信