Malicious webshell family dataset for webshell multi-classification research

IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Ying Zhao , Shenglan Lv , Wenwei Long , Yilun Fan , Jian Yuan , Haojin Jiang , Fangfang Zhou
{"title":"Malicious webshell family dataset for webshell multi-classification research","authors":"Ying Zhao ,&nbsp;Shenglan Lv ,&nbsp;Wenwei Long ,&nbsp;Yilun Fan ,&nbsp;Jian Yuan ,&nbsp;Haojin Jiang ,&nbsp;Fangfang Zhou","doi":"10.1016/j.visinf.2023.06.008","DOIUrl":null,"url":null,"abstract":"<div><p>Malicious webshells currently present tremendous threats to cloud security. Most relevant studies and open webshell datasets consider malicious webshell defense as a binary classification problem, that is, identifying whether a webshell is malicious or benign. However, a fine-grained multi-classification is urgently needed to enable precise responses and active defenses on malicious webshell threats. This paper introduces a malicious webshell family dataset named MWF to facilitate webshell multi-classification researches. This dataset contains 1359 malicious webshell samples originally obtained from the cloud servers of Alibaba Cloud. Each of them is provided with a family label. The samples of the same family generally present similar characteristics or behaviors. The dataset has a total of 78 families and 22 outliers. Moreover, this paper introduces the human–machine collaboration process that is adopted to remove benign or duplicate samples, address privacy issues, and determine the family of each sample. This paper also compares the distinguished features of the MWF dataset with previous datasets and summarizes the potential applied areas in cloud security and generalized sequence, graph, and tree data analytics and visualization.</p></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468502X23000335/pdfft?md5=0e04b6b31402572c03a419f9b7597a47&pid=1-s2.0-S2468502X23000335-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Informatics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468502X23000335","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Malicious webshells currently present tremendous threats to cloud security. Most relevant studies and open webshell datasets consider malicious webshell defense as a binary classification problem, that is, identifying whether a webshell is malicious or benign. However, a fine-grained multi-classification is urgently needed to enable precise responses and active defenses on malicious webshell threats. This paper introduces a malicious webshell family dataset named MWF to facilitate webshell multi-classification researches. This dataset contains 1359 malicious webshell samples originally obtained from the cloud servers of Alibaba Cloud. Each of them is provided with a family label. The samples of the same family generally present similar characteristics or behaviors. The dataset has a total of 78 families and 22 outliers. Moreover, this paper introduces the human–machine collaboration process that is adopted to remove benign or duplicate samples, address privacy issues, and determine the family of each sample. This paper also compares the distinguished features of the MWF dataset with previous datasets and summarizes the potential applied areas in cloud security and generalized sequence, graph, and tree data analytics and visualization.

用于 Webhell 多分类研究的恶意 Webhell 系列数据集
目前,恶意 webshell 对云安全构成了巨大威胁。大多数相关研究和开放式 webshell 数据集将恶意 webshell 防御视为二元分类问题,即识别 webshell 是恶意还是良性。然而,要实现对恶意 webshell 威胁的精确响应和主动防御,迫切需要一种细粒度的多分类方法。本文介绍了一个名为 MWF 的恶意 webshell 系列数据集,以促进 webshell 多分类研究。该数据集包含 1359 个恶意 webshell 样本,这些样本最初来自阿里云的云服务器。每个样本都有一个家族标签。同一家族的样本一般具有相似的特征或行为。该数据集共有 78 个族和 22 个异常值。此外,本文还介绍了人机协作流程,该流程用于去除良性或重复样本、解决隐私问题以及确定每个样本的族。本文还比较了 MWF 数据集与以往数据集的显著特点,并总结了在云安全和广义序列、图和树数据分析及可视化方面的潜在应用领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Visual Informatics
Visual Informatics Computer Science-Computer Graphics and Computer-Aided Design
CiteScore
6.70
自引率
3.30%
发文量
33
审稿时长
79 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信