量化大数据网络安全分析设计策略的影响:一项实证调查

Faheem Ullah, M. Babar
{"title":"量化大数据网络安全分析设计策略的影响:一项实证调查","authors":"Faheem Ullah, M. Babar","doi":"10.1109/PDCAT46702.2019.00037","DOIUrl":null,"url":null,"abstract":"Big Data Cyber Security Analytics (BDCA) systems use big data technologies (e.g., Hadoop and Spark) for collecting, storing, and analyzing a large volume of security event data to detect cyber-attacks. The state-of-the-art uses various design strategies (e.g., feature selection and alert ranking) to help BDCA systems to achieve the desired levels of accuracy and response time. However, the use of these strategies in the state-of-the-art is not consistent, which exposes a lack of consensus on \"when to use (and not to use) these design strategies?\" In this paper, we follow a systematic experimentation framework to quantify the impact of four design strategies on the accuracy and response time with respect to three contextual factors i.e., security data, machine learning model employed in the system, and the execution mode of the system. For the aimed quantification, we performed experiments on a Hadoop-based BDCA system using four security datasets, five machine learning models, and three execution modes. Our findings lead us to formulate a set of design guidelines that will help researchers and practitioners to decide when to use (and not to use) the design strategies.","PeriodicalId":166126,"journal":{"name":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Quantifying the Impact of Design Strategies for Big Data Cyber Security Analytics: An Empirical Investigation\",\"authors\":\"Faheem Ullah, M. Babar\",\"doi\":\"10.1109/PDCAT46702.2019.00037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big Data Cyber Security Analytics (BDCA) systems use big data technologies (e.g., Hadoop and Spark) for collecting, storing, and analyzing a large volume of security event data to detect cyber-attacks. The state-of-the-art uses various design strategies (e.g., feature selection and alert ranking) to help BDCA systems to achieve the desired levels of accuracy and response time. However, the use of these strategies in the state-of-the-art is not consistent, which exposes a lack of consensus on \\\"when to use (and not to use) these design strategies?\\\" In this paper, we follow a systematic experimentation framework to quantify the impact of four design strategies on the accuracy and response time with respect to three contextual factors i.e., security data, machine learning model employed in the system, and the execution mode of the system. For the aimed quantification, we performed experiments on a Hadoop-based BDCA system using four security datasets, five machine learning models, and three execution modes. Our findings lead us to formulate a set of design guidelines that will help researchers and practitioners to decide when to use (and not to use) the design strategies.\",\"PeriodicalId\":166126,\"journal\":{\"name\":\"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT46702.2019.00037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT46702.2019.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

大数据网络安全分析(BDCA)系统使用大数据技术(如Hadoop和Spark)收集、存储和分析大量安全事件数据,以检测网络攻击。最先进的技术使用各种设计策略(例如,特征选择和警报排名)来帮助BDCA系统达到所需的准确性和响应时间水平。然而,这些策略在最新技术中的使用是不一致的,这暴露了对“何时使用(或不使用)这些设计策略”缺乏共识?在本文中,我们遵循一个系统的实验框架来量化四种设计策略对准确性和响应时间的影响,涉及三个上下文因素,即安全数据、系统中使用的机器学习模型和系统的执行模式。为了量化目标,我们在基于hadoop的BDCA系统上进行了实验,使用了四个安全数据集、五个机器学习模型和三种执行模式。我们的发现引导我们制定了一套设计指南,帮助研究人员和实践者决定何时使用(或不使用)设计策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Quantifying the Impact of Design Strategies for Big Data Cyber Security Analytics: An Empirical Investigation
Big Data Cyber Security Analytics (BDCA) systems use big data technologies (e.g., Hadoop and Spark) for collecting, storing, and analyzing a large volume of security event data to detect cyber-attacks. The state-of-the-art uses various design strategies (e.g., feature selection and alert ranking) to help BDCA systems to achieve the desired levels of accuracy and response time. However, the use of these strategies in the state-of-the-art is not consistent, which exposes a lack of consensus on "when to use (and not to use) these design strategies?" In this paper, we follow a systematic experimentation framework to quantify the impact of four design strategies on the accuracy and response time with respect to three contextual factors i.e., security data, machine learning model employed in the system, and the execution mode of the system. For the aimed quantification, we performed experiments on a Hadoop-based BDCA system using four security datasets, five machine learning models, and three execution modes. Our findings lead us to formulate a set of design guidelines that will help researchers and practitioners to decide when to use (and not to use) the design strategies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信