A Practical Guide for Detecting the Java Script-Based Malware Using Hidden Markov Models and Linear Classifiers

2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing Pub Date : 2014-09-01 DOI:10.1109/SYNASC.2014.39

Doina Cosovan, Razvan Benchea, Dragos Gavrilut

{"title":"A Practical Guide for Detecting the Java Script-Based Malware Using Hidden Markov Models and Linear Classifiers","authors":"Doina Cosovan, Razvan Benchea, Dragos Gavrilut","doi":"10.1109/SYNASC.2014.39","DOIUrl":null,"url":null,"abstract":"The World Wide Web evolved so rapidly that it is no longer considered a luxury, but a necessity. That is why currently the most popular infection vectors used by cyber criminals are either web pages or commonly used documents (such as pdf files). In both of these cases, the malicious actions performed are written in Java Script. Because of this, Java Script has become the preferred language for spreading malware. In order to be able to stop malicious content from executing, detection of its infection vector is crucial. In this paper we propose various methods for detecting Java Script-based attack vectors. For achieving our goal we first need to fight metamorphism techniques usually used in Java Script malicious code, which are by no means trivial: garbage instruction insertion, variable renaming, equivalent instruction substitution, function permutation, instruction reordering, and so on. Our approach to deal with metamorphism starts with splitting the Java Script content in components and filtering the insignificant ones. We then use a data set, consisting in over one million Java Script files in order to test several machine learning algorithms such as Hidden Markov Models, linear classifiers and hybrid approaches for malware detection. Finally, we analyze these detection methods from a practical point of view, emphasizing the need for a very low false positive rate and the ability to be trained on large datasets.","PeriodicalId":150575,"journal":{"name":"2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2014.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

The World Wide Web evolved so rapidly that it is no longer considered a luxury, but a necessity. That is why currently the most popular infection vectors used by cyber criminals are either web pages or commonly used documents (such as pdf files). In both of these cases, the malicious actions performed are written in Java Script. Because of this, Java Script has become the preferred language for spreading malware. In order to be able to stop malicious content from executing, detection of its infection vector is crucial. In this paper we propose various methods for detecting Java Script-based attack vectors. For achieving our goal we first need to fight metamorphism techniques usually used in Java Script malicious code, which are by no means trivial: garbage instruction insertion, variable renaming, equivalent instruction substitution, function permutation, instruction reordering, and so on. Our approach to deal with metamorphism starts with splitting the Java Script content in components and filtering the insignificant ones. We then use a data set, consisting in over one million Java Script files in order to test several machine learning algorithms such as Hidden Markov Models, linear classifiers and hybrid approaches for malware detection. Finally, we analyze these detection methods from a practical point of view, emphasizing the need for a very low false positive rate and the ability to be trained on large datasets.

查看原文本刊更多论文

使用隐马尔可夫模型和线性分类器检测基于Java脚本的恶意软件的实用指南

万维网发展得如此之快，以至于它不再被认为是奢侈品，而是必需品。这就是为什么目前网络犯罪分子使用的最流行的感染媒介要么是网页，要么是常用的文档(如pdf文件)。在这两种情况下，所执行的恶意操作都是用Java Script编写的。正因为如此，Java脚本已经成为传播恶意软件的首选语言。为了能够阻止恶意内容的执行，检测其感染载体至关重要。在本文中，我们提出了各种检测基于Java脚本的攻击向量的方法。为了实现我们的目标，我们首先需要对抗通常在Java Script恶意代码中使用的变质技术，这些技术绝不是微不足道的:垃圾指令插入、变量重命名、等效指令替换、函数排列、指令重新排序等等。我们处理变形的方法是从在组件中拆分Java脚本内容并过滤不重要的内容开始的。然后，我们使用一个数据集，包含超过一百万个Java脚本文件，以测试几种机器学习算法，如隐马尔可夫模型，线性分类器和恶意软件检测的混合方法。最后，我们从实际的角度分析了这些检测方法，强调需要非常低的假阳性率和在大数据集上训练的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing

自引率

0.00%

发文量