利用大数据技术构建大规模入侵检测系统

Proceedings of International Symposium on Grids and Clouds 2018 in conjunction with Frontiers in Computational Drug Discovery — PoS(ISGC 2018 & FCDD) Pub Date : 2018-12-12 DOI:10.22323/1.327.0014

Pablo Panero, L. Valsan, Vincent Brillault, Ioan Cristian Schuszter

{"title":"利用大数据技术构建大规模入侵检测系统","authors":"Pablo Panero, L. Valsan, Vincent Brillault, Ioan Cristian Schuszter","doi":"10.22323/1.327.0014","DOIUrl":null,"url":null,"abstract":"Computer security threats have always been a major concern and continue to increase in frequency and complexity. The nature and techniques of the attacks evolve rapidly over time, making their detection more difficult. Therefore the means and tools used to deal with them need to evolve at the same pace if not faster. \nIn this paper the implementation of an Intrusion Detection System (IDS) both at the Network (NIDS) and Host (HIDS) level, used at CERN, is presented. The system is currently processing in real time approximately one TB of data per day, with the final goal of coping with at least 5 TB / day. In order to accomplish this goal at first an infrastructure to collect data from sources such as system logs, web server logs and the NIDS logs has been developed making use of technologies such as Apache Flume and Apache Kafka. Once the data is collected it needs to be processed in search of malicious activity: the data is consumed by Apache Spark jobs which compare in real time this data with known signatures of malicious activities. These are known as Indicators of Compromise (IoC). They are published by many security experts and centralized in a local Malware Information Sharing Platform (MISP) instance. \nNonetheless, detecting an intrusion is not enough. There is a need to understand what happened and why. In order to gain knowledge on the context of the detected intrusion the data is also enriched in real time when it is passing through the pipeline. For example, DNS resolution and IP geolocation are applied to it. A system generic enough to process any kind of data in JSON format is enriching the data in order to get additional context of what is happening and finally looking for indicators of compromise to detect possible intrusions, making use of the latest technologies in the Big Data ecosystem.","PeriodicalId":135658,"journal":{"name":"Proceedings of International Symposium on Grids and Clouds 2018 in conjunction with Frontiers in Computational Drug Discovery — PoS(ISGC 2018 & FCDD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Building a large scale Intrusion Detection System using Big Data technologies\",\"authors\":\"Pablo Panero, L. Valsan, Vincent Brillault, Ioan Cristian Schuszter\",\"doi\":\"10.22323/1.327.0014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computer security threats have always been a major concern and continue to increase in frequency and complexity. The nature and techniques of the attacks evolve rapidly over time, making their detection more difficult. Therefore the means and tools used to deal with them need to evolve at the same pace if not faster. \\nIn this paper the implementation of an Intrusion Detection System (IDS) both at the Network (NIDS) and Host (HIDS) level, used at CERN, is presented. The system is currently processing in real time approximately one TB of data per day, with the final goal of coping with at least 5 TB / day. In order to accomplish this goal at first an infrastructure to collect data from sources such as system logs, web server logs and the NIDS logs has been developed making use of technologies such as Apache Flume and Apache Kafka. Once the data is collected it needs to be processed in search of malicious activity: the data is consumed by Apache Spark jobs which compare in real time this data with known signatures of malicious activities. These are known as Indicators of Compromise (IoC). They are published by many security experts and centralized in a local Malware Information Sharing Platform (MISP) instance. \\nNonetheless, detecting an intrusion is not enough. There is a need to understand what happened and why. In order to gain knowledge on the context of the detected intrusion the data is also enriched in real time when it is passing through the pipeline. For example, DNS resolution and IP geolocation are applied to it. A system generic enough to process any kind of data in JSON format is enriching the data in order to get additional context of what is happening and finally looking for indicators of compromise to detect possible intrusions, making use of the latest technologies in the Big Data ecosystem.\",\"PeriodicalId\":135658,\"journal\":{\"name\":\"Proceedings of International Symposium on Grids and Clouds 2018 in conjunction with Frontiers in Computational Drug Discovery — PoS(ISGC 2018 & FCDD)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of International Symposium on Grids and Clouds 2018 in conjunction with Frontiers in Computational Drug Discovery — PoS(ISGC 2018 & FCDD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22323/1.327.0014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of International Symposium on Grids and Clouds 2018 in conjunction with Frontiers in Computational Drug Discovery — PoS(ISGC 2018 & FCDD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22323/1.327.0014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

计算机安全威胁一直是人们关注的主要问题，并且在频率和复杂性方面继续增加。随着时间的推移，攻击的性质和技术会迅速发展，这使得检测起来更加困难。因此，用于处理它们的手段和工具需要以同样的速度发展，如果不是更快的话。本文介绍了欧洲核子研究中心在网络(NIDS)和主机(HIDS)两层的入侵检测系统(IDS)的实现。该系统目前每天实时处理大约1tb的数据，最终目标是每天处理至少5tb的数据。为了实现这一目标，我们首先开发了一个基础设施，利用Apache Flume和Apache Kafka等技术，从系统日志、web服务器日志和NIDS日志等来源收集数据。一旦收集到数据，就需要对其进行处理，以搜索恶意活动:Apache Spark作业将使用这些数据，并将这些数据与已知的恶意活动签名进行实时比较。这些被称为妥协指标(IoC)。它们由许多安全专家发布，并集中在本地恶意软件信息共享平台(MISP)实例中。然而，检测到入侵是不够的。有必要了解发生了什么以及原因。为了获得有关检测到的入侵上下文的知识，数据在通过管道时也会实时丰富。如DNS解析、IP地理定位等。一个能够以JSON格式处理任何类型数据的通用系统正在丰富数据，以便获得正在发生的事情的额外背景，并最终寻找折衷指标以检测可能的入侵，利用大数据生态系统中的最新技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Building a large scale Intrusion Detection System using Big Data technologies

Computer security threats have always been a major concern and continue to increase in frequency and complexity. The nature and techniques of the attacks evolve rapidly over time, making their detection more difficult. Therefore the means and tools used to deal with them need to evolve at the same pace if not faster. In this paper the implementation of an Intrusion Detection System (IDS) both at the Network (NIDS) and Host (HIDS) level, used at CERN, is presented. The system is currently processing in real time approximately one TB of data per day, with the final goal of coping with at least 5 TB / day. In order to accomplish this goal at first an infrastructure to collect data from sources such as system logs, web server logs and the NIDS logs has been developed making use of technologies such as Apache Flume and Apache Kafka. Once the data is collected it needs to be processed in search of malicious activity: the data is consumed by Apache Spark jobs which compare in real time this data with known signatures of malicious activities. These are known as Indicators of Compromise (IoC). They are published by many security experts and centralized in a local Malware Information Sharing Platform (MISP) instance. Nonetheless, detecting an intrusion is not enough. There is a need to understand what happened and why. In order to gain knowledge on the context of the detected intrusion the data is also enriched in real time when it is passing through the pipeline. For example, DNS resolution and IP geolocation are applied to it. A system generic enough to process any kind of data in JSON format is enriching the data in order to get additional context of what is happening and finally looking for indicators of compromise to detect possible intrusions, making use of the latest technologies in the Big Data ecosystem.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of International Symposium on Grids and Clouds 2018 in conjunction with Frontiers in Computational Drug Discovery — PoS(ISGC 2018 & FCDD)

自引率

0.00%

发文量