A user-centric machine learning framework for cyber security operations center

2017 IEEE International Conference on Intelligence and Security Informatics (ISI) Pub Date : 2017-07-22 DOI:10.1109/ISI.2017.8004902

Charles Feng, Shuning Wu, Ningwei Liu

{"title":"A user-centric machine learning framework for cyber security operations center","authors":"Charles Feng, Shuning Wu, Ningwei Liu","doi":"10.1109/ISI.2017.8004902","DOIUrl":null,"url":null,"abstract":"To assure cyber security of an enterprise, typically SIEM (Security Information and Event Management) system is in place to normalize security events from different preventive technologies and flag alerts. Analysts in the security operation center (SOC) investigate the alerts to decide if it is truly malicious or not. However, generally the number of alerts is overwhelming with majority of them being false positive and exceeding the SOC's capacity to handle all alerts. Because of this, potential malicious attacks and compromised hosts may be missed. Machine learning is a viable approach to reduce the false positive rate and improve the productivity of SOC analysts. In this paper, we develop a user-centric machine learning framework for the cyber security operation center in real enterprise environment. We discuss the typical data sources in SOC, their work flow, and how to leverage and process these data sets to build an effective machine learning system. The paper is targeted towards two groups of readers. The first group is data scientists or machine learning researchers who do not have cyber security domain knowledge but want to build machine learning systems for security operations center. The second group of audiences are those cyber security practitioners who have deep knowledge and expertise in cyber security, but do not have machine learning experiences and wish to build one by themselves. Throughout the paper, we use the system we built in the Symantec SOC production environment as an example to demonstrate the complete steps from data collection, label creation, feature engineering, machine learning algorithm selection, model performance evaluations, to risk score generation.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2017.8004902","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 51

Abstract

To assure cyber security of an enterprise, typically SIEM (Security Information and Event Management) system is in place to normalize security events from different preventive technologies and flag alerts. Analysts in the security operation center (SOC) investigate the alerts to decide if it is truly malicious or not. However, generally the number of alerts is overwhelming with majority of them being false positive and exceeding the SOC's capacity to handle all alerts. Because of this, potential malicious attacks and compromised hosts may be missed. Machine learning is a viable approach to reduce the false positive rate and improve the productivity of SOC analysts. In this paper, we develop a user-centric machine learning framework for the cyber security operation center in real enterprise environment. We discuss the typical data sources in SOC, their work flow, and how to leverage and process these data sets to build an effective machine learning system. The paper is targeted towards two groups of readers. The first group is data scientists or machine learning researchers who do not have cyber security domain knowledge but want to build machine learning systems for security operations center. The second group of audiences are those cyber security practitioners who have deep knowledge and expertise in cyber security, but do not have machine learning experiences and wish to build one by themselves. Throughout the paper, we use the system we built in the Symantec SOC production environment as an example to demonstrate the complete steps from data collection, label creation, feature engineering, machine learning algorithm selection, model performance evaluations, to risk score generation.

查看原文本刊更多论文

面向网络安全运营中心的以用户为中心的机器学习框架

为了确保企业的网络安全，通常使用SIEM(安全信息和事件管理)系统来规范来自不同预防技术的安全事件并标记警报。安全操作中心(SOC)的分析师会调查警报，以确定它是否真的是恶意的。然而，通常警报的数量是压倒性的，其中大多数是误报，超过了SOC处理所有警报的能力。因此，可能会错过潜在的恶意攻击和受损主机。机器学习是降低误报率和提高SOC分析人员工作效率的可行方法。在本文中，我们为真实企业环境下的网络安全运营中心开发了一个以用户为中心的机器学习框架。我们讨论了SOC中的典型数据源，它们的工作流程，以及如何利用和处理这些数据集来构建有效的机器学习系统。这篇论文的目标读者是两类人。第一类是数据科学家或机器学习研究人员，他们不具备网络安全领域的知识，但希望为安全运营中心构建机器学习系统。第二类受众是网络安全从业人员，他们在网络安全方面有深厚的知识和专业知识，但没有机器学习经验，希望自己建立机器学习经验。在整个论文中，我们以在赛门铁克SOC生产环境中构建的系统为例，演示了从数据收集、标签创建、特征工程、机器学习算法选择、模型性能评估到风险评分生成的完整步骤。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Intelligence and Security Informatics (ISI)

自引率

0.00%

发文量