Searching for Malware Dataset: a Systematic Literature Review

Luqman Muhammad Zagi, Baharuddin Aziz
{"title":"Searching for Malware Dataset: a Systematic Literature Review","authors":"Luqman Muhammad Zagi, Baharuddin Aziz","doi":"10.1109/ICITSI50517.2020.9264929","DOIUrl":null,"url":null,"abstract":"Malware is one of the exciting topics widely discussed by both academicians and researchers, but the source list of malware rarely provided. Therefore, this paper aims to write a Systematic Literature Review (SLR) to find which datasets are commonly used by previous researchers. The three journal databases were used in this study, including IEEE, science direct, and ACM. The PRISMA statement was applied to maintain transparency during the literature review. To facilitate the search, the authors also provide limitations during the SLR process (inclusion and exclusion). The inclusion includes: (1) full article fully written in English; (2) peer-reviewed papers; (3) explicitly mentioning the name of dataset or database; and (4) explicitly mentioning the method to find malware characteristics and behavior. While the exclusion consists of: (1) articles written before 2015; (2) book and white paper; (3) article already indexed in another database journal; and (4) paper which is less than four pages. After both filter processes, there are 42 out of 245 articles eligible to answer the stated research question (RQ), which were: (1) where does the researcher usually find the malware database or dataset?; (2) what kind of methods applied by previous researchers to find the malware’s characteristics or behavior?; and (3) which platforms that malware usually attacks are? Based on the three RQs, we could conclude that RQ1 recorded for 37 datasets, RQ2 recorded for 47 methods, and RQ3 recorded for six platforms.","PeriodicalId":286828,"journal":{"name":"2020 International Conference on Information Technology Systems and Innovation (ICITSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Information Technology Systems and Innovation (ICITSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITSI50517.2020.9264929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Malware is one of the exciting topics widely discussed by both academicians and researchers, but the source list of malware rarely provided. Therefore, this paper aims to write a Systematic Literature Review (SLR) to find which datasets are commonly used by previous researchers. The three journal databases were used in this study, including IEEE, science direct, and ACM. The PRISMA statement was applied to maintain transparency during the literature review. To facilitate the search, the authors also provide limitations during the SLR process (inclusion and exclusion). The inclusion includes: (1) full article fully written in English; (2) peer-reviewed papers; (3) explicitly mentioning the name of dataset or database; and (4) explicitly mentioning the method to find malware characteristics and behavior. While the exclusion consists of: (1) articles written before 2015; (2) book and white paper; (3) article already indexed in another database journal; and (4) paper which is less than four pages. After both filter processes, there are 42 out of 245 articles eligible to answer the stated research question (RQ), which were: (1) where does the researcher usually find the malware database or dataset?; (2) what kind of methods applied by previous researchers to find the malware’s characteristics or behavior?; and (3) which platforms that malware usually attacks are? Based on the three RQs, we could conclude that RQ1 recorded for 37 datasets, RQ2 recorded for 47 methods, and RQ3 recorded for six platforms.
搜索恶意软件数据集:系统文献综述
恶意软件是学术界和研究人员广泛讨论的令人兴奋的话题之一,但很少提供恶意软件的来源列表。因此,本文旨在撰写系统性文献综述(SLR),以找出哪些数据集是以前的研究人员常用的。本研究使用了三个期刊数据库,包括IEEE、science direct和ACM。在文献综述期间,采用PRISMA声明来保持透明度。为了方便检索,作者还在单反过程中提供了限制条件(包括和排除)。收录内容包括:(1)全英文撰写的完整文章;(二)经同行评议的论文;(三)明确提及数据集或者数据库名称的;(4)明确提出了查找恶意软件特征和行为的方法。而排除包括:(1)2015年之前撰写的文章;(二)书、白纸;(三)已被其他数据库期刊收录的文章;(四)少于四页的论文。在这两个过滤过程之后,245篇文章中有42篇有资格回答规定的研究问题(RQ),这些问题是:(1)研究人员通常在哪里找到恶意软件数据库或数据集?(2)以往研究人员采用何种方法来发现恶意软件的特征或行为?(3)恶意软件通常攻击哪些平台?根据这三个rq,我们可以得出RQ1被记录在37个数据集上,RQ2被记录在47种方法上,RQ3被记录在6个平台上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信