PyRHOH：一个元学习分析框架，用于确定编译对恶意JavaScript识别的影响

IF 4.9

Machine learning with applications Pub Date : 2025-08-18 DOI:10.1016/j.mlwa.2025.100724

Eli Fulkerson , Eric Yocam , Varghese Vaidyan , Mahesh Kamepalli , Yong Wang , Gurcan Comert

{"title":"PyRHOH：一个元学习分析框架，用于确定编译对恶意JavaScript识别的影响","authors":"Eli Fulkerson , Eric Yocam , Varghese Vaidyan , Mahesh Kamepalli , Yong Wang , Gurcan Comert","doi":"10.1016/j.mlwa.2025.100724","DOIUrl":null,"url":null,"abstract":"<div><div>Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"21 ","pages":"Article 100724"},"PeriodicalIF":4.9000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification\",\"authors\":\"Eli Fulkerson , Eric Yocam , Varghese Vaidyan , Mahesh Kamepalli , Yong Wang , Gurcan Comert\",\"doi\":\"10.1016/j.mlwa.2025.100724\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"21 \",\"pages\":\"Article 100724\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025001070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025001070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

恶意JavaScript的自动识别是现代恶意软件分析中的一个核心问题。代码混淆是逃避检测的常用策略。这种混淆阻碍了人工和自动检测方法，包括神经网络技术。为了使这些方法能够有效地对恶意软件进行分类，减少混淆的影响以及优化神经网络的配置和结构以适应任务。为了克服这些挑战，我们提出了一个新的框架：“PyRHOH”（“Python可重复超参数优化工具”），这是一个实现贝叶斯优化的元学习框架。使用贝叶斯方法对候选超参数的自动探索和最大化增加了神经网络超参数选择的结构和严谨性，为实现的设计提供了最优的保证。在这项研究中，我们使用PyRHOH框架来确定用于区分恶意和良性JavaScript样本的最佳递归神经网络架构。然后，我们使用这些神经网络来测量通过谷歌的V8 JavaScript编译器将原始JavaScript样本编译成字节码对分类准确性的影响程度。对野外样本进行分类后，编译将检出率从76.88%提高到95.84%。在均匀混淆的样本中，编译将检出率从平均76.76%提高到平均91.24%。这表明将JavaScript预处理为编译后的字节码对神经网络分类有明显的积极影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification

Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days