用于 Webhell 多分类研究的恶意 Webhell 系列数据集

IF 3.8 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Visual Informatics Pub Date : 2024-03-01 DOI:10.1016/j.visinf.2023.06.008

Ying Zhao , Shenglan Lv , Wenwei Long , Yilun Fan , Jian Yuan , Haojin Jiang , Fangfang Zhou

{"title":"用于 Webhell 多分类研究的恶意 Webhell 系列数据集","authors":"Ying Zhao , Shenglan Lv , Wenwei Long , Yilun Fan , Jian Yuan , Haojin Jiang , Fangfang Zhou","doi":"10.1016/j.visinf.2023.06.008","DOIUrl":null,"url":null,"abstract":"<div><p>Malicious webshells currently present tremendous threats to cloud security. Most relevant studies and open webshell datasets consider malicious webshell defense as a binary classification problem, that is, identifying whether a webshell is malicious or benign. However, a fine-grained multi-classification is urgently needed to enable precise responses and active defenses on malicious webshell threats. This paper introduces a malicious webshell family dataset named MWF to facilitate webshell multi-classification researches. This dataset contains 1359 malicious webshell samples originally obtained from the cloud servers of Alibaba Cloud. Each of them is provided with a family label. The samples of the same family generally present similar characteristics or behaviors. The dataset has a total of 78 families and 22 outliers. Moreover, this paper introduces the human–machine collaboration process that is adopted to remove benign or duplicate samples, address privacy issues, and determine the family of each sample. This paper also compares the distinguished features of the MWF dataset with previous datasets and summarizes the potential applied areas in cloud security and generalized sequence, graph, and tree data analytics and visualization.</p></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"8 1","pages":"Pages 47-55"},"PeriodicalIF":3.8000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468502X23000335/pdfft?md5=0e04b6b31402572c03a419f9b7597a47&pid=1-s2.0-S2468502X23000335-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Malicious webshell family dataset for webshell multi-classification research\",\"authors\":\"Ying Zhao , Shenglan Lv , Wenwei Long , Yilun Fan , Jian Yuan , Haojin Jiang , Fangfang Zhou\",\"doi\":\"10.1016/j.visinf.2023.06.008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Malicious webshells currently present tremendous threats to cloud security. Most relevant studies and open webshell datasets consider malicious webshell defense as a binary classification problem, that is, identifying whether a webshell is malicious or benign. However, a fine-grained multi-classification is urgently needed to enable precise responses and active defenses on malicious webshell threats. This paper introduces a malicious webshell family dataset named MWF to facilitate webshell multi-classification researches. This dataset contains 1359 malicious webshell samples originally obtained from the cloud servers of Alibaba Cloud. Each of them is provided with a family label. The samples of the same family generally present similar characteristics or behaviors. The dataset has a total of 78 families and 22 outliers. Moreover, this paper introduces the human–machine collaboration process that is adopted to remove benign or duplicate samples, address privacy issues, and determine the family of each sample. This paper also compares the distinguished features of the MWF dataset with previous datasets and summarizes the potential applied areas in cloud security and generalized sequence, graph, and tree data analytics and visualization.</p></div>\",\"PeriodicalId\":36903,\"journal\":{\"name\":\"Visual Informatics\",\"volume\":\"8 1\",\"pages\":\"Pages 47-55\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2468502X23000335/pdfft?md5=0e04b6b31402572c03a419f9b7597a47&pid=1-s2.0-S2468502X23000335-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Visual Informatics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468502X23000335\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Informatics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468502X23000335","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

目前，恶意 webshell 对云安全构成了巨大威胁。大多数相关研究和开放式 webshell 数据集将恶意 webshell 防御视为二元分类问题，即识别 webshell 是恶意还是良性。然而，要实现对恶意 webshell 威胁的精确响应和主动防御，迫切需要一种细粒度的多分类方法。本文介绍了一个名为 MWF 的恶意 webshell 系列数据集，以促进 webshell 多分类研究。该数据集包含 1359 个恶意 webshell 样本，这些样本最初来自阿里云的云服务器。每个样本都有一个家族标签。同一家族的样本一般具有相似的特征或行为。该数据集共有 78 个族和 22 个异常值。此外，本文还介绍了人机协作流程，该流程用于去除良性或重复样本、解决隐私问题以及确定每个样本的族。本文还比较了 MWF 数据集与以往数据集的显著特点，并总结了在云安全和广义序列、图和树数据分析及可视化方面的潜在应用领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Malicious webshell family dataset for webshell multi-classification research

Malicious webshells currently present tremendous threats to cloud security. Most relevant studies and open webshell datasets consider malicious webshell defense as a binary classification problem, that is, identifying whether a webshell is malicious or benign. However, a fine-grained multi-classification is urgently needed to enable precise responses and active defenses on malicious webshell threats. This paper introduces a malicious webshell family dataset named MWF to facilitate webshell multi-classification researches. This dataset contains 1359 malicious webshell samples originally obtained from the cloud servers of Alibaba Cloud. Each of them is provided with a family label. The samples of the same family generally present similar characteristics or behaviors. The dataset has a total of 78 families and 22 outliers. Moreover, this paper introduces the human–machine collaboration process that is adopted to remove benign or duplicate samples, address privacy issues, and determine the family of each sample. This paper also compares the distinguished features of the MWF dataset with previous datasets and summarizes the potential applied areas in cloud security and generalized sequence, graph, and tree data analytics and visualization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Visual Informatics Computer Science-Computer Graphics and Computer-Aided Design

CiteScore

6.70

自引率

3.30%

发文量

审稿时长

79 days