基于nlp的API序列恶意软件分类方法

2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES) Pub Date : 2017-11-01 DOI:10.1109/IESYS.2017.8233569

T. Tran, Hiroshi Sato

{"title":"基于nlp的API序列恶意软件分类方法","authors":"T. Tran, Hiroshi Sato","doi":"10.1109/IESYS.2017.8233569","DOIUrl":null,"url":null,"abstract":"In the field of malware analysis, two basic types, which are static analysis and dynamic analysis, are involved in the process of understanding on how particular malware functions. By using dynamic analysis, malware researchers could collect API call sequences that are very valuable sources of information for identifying malware behavior. The proposed malware classification procedures introduced in this paper use API call sequences as inputs to classifiers. In addition, taking advantage of the development in Natural Language Processing field, we use some methods such as n-gram, doc2vec (or Paragraph vectors), TF-IDF to convert those API sequences to numeric vectors before feeding to the classifiers. Our proposed approaches are divided into 3 different methods to classify malware, that is TF-IDF, Paragraph Vector with Distributed Bag of Words and Paragraph Vector with Distributed Memory. Each of them provides us a very good accuracy.","PeriodicalId":429982,"journal":{"name":"2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":"{\"title\":\"NLP-based approaches for malware classification from API sequences\",\"authors\":\"T. Tran, Hiroshi Sato\",\"doi\":\"10.1109/IESYS.2017.8233569\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of malware analysis, two basic types, which are static analysis and dynamic analysis, are involved in the process of understanding on how particular malware functions. By using dynamic analysis, malware researchers could collect API call sequences that are very valuable sources of information for identifying malware behavior. The proposed malware classification procedures introduced in this paper use API call sequences as inputs to classifiers. In addition, taking advantage of the development in Natural Language Processing field, we use some methods such as n-gram, doc2vec (or Paragraph vectors), TF-IDF to convert those API sequences to numeric vectors before feeding to the classifiers. Our proposed approaches are divided into 3 different methods to classify malware, that is TF-IDF, Paragraph Vector with Distributed Bag of Words and Paragraph Vector with Distributed Memory. Each of them provides us a very good accuracy.\",\"PeriodicalId\":429982,\"journal\":{\"name\":\"2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"37\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IESYS.2017.8233569\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IESYS.2017.8233569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

摘要

在恶意软件分析领域，在理解特定恶意软件如何运行的过程中，涉及到静态分析和动态分析两种基本类型。通过动态分析，恶意软件研究人员可以收集API调用序列，这是识别恶意软件行为的非常有价值的信息源。本文提出的恶意软件分类方法使用API调用序列作为分类器的输入。此外，利用自然语言处理领域的发展，我们使用n-gram, doc2vec(或段落向量)，TF-IDF等方法将这些API序列转换为数字向量，然后再馈给分类器。我们提出的方法分为3种不同的恶意软件分类方法，即TF-IDF、具有分布式词包的段落向量和具有分布式内存的段落向量。它们中的每一个都为我们提供了非常好的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

NLP-based approaches for malware classification from API sequences

In the field of malware analysis, two basic types, which are static analysis and dynamic analysis, are involved in the process of understanding on how particular malware functions. By using dynamic analysis, malware researchers could collect API call sequences that are very valuable sources of information for identifying malware behavior. The proposed malware classification procedures introduced in this paper use API call sequences as inputs to classifiers. In addition, taking advantage of the development in Natural Language Processing field, we use some methods such as n-gram, doc2vec (or Paragraph vectors), TF-IDF to convert those API sequences to numeric vectors before feeding to the classifiers. Our proposed approaches are divided into 3 different methods to classify malware, that is TF-IDF, Paragraph Vector with Distributed Bag of Words and Paragraph Vector with Distributed Memory. Each of them provides us a very good accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES)

自引率

0.00%

发文量