网络威胁情报信息内容选择研究

2021 IEEE International Conference on Cyber Security and Resilience (CSR) Pub Date : 2021-07-26 DOI:10.1109/CSR51186.2021.9527909

Panos Panagiotou, Christos Iliou, Konstantinos Apostolou, T. Tsikrika, S. Vrochidis, P. Chatzimisios, I. Kompatsiaris

{"title":"网络威胁情报信息内容选择研究","authors":"Panos Panagiotou, Christos Iliou, Konstantinos Apostolou, T. Tsikrika, S. Vrochidis, P. Chatzimisios, I. Kompatsiaris","doi":"10.1109/CSR51186.2021.9527909","DOIUrl":null,"url":null,"abstract":"Nowadays, there is an increasing need for cyber security professionals to make use of tools that automatically extract Cyber Threat Intelligence (CTI) relying on information collected from relevant blogs and news sources that are publicly available. When such sources are used, an important part of the CTI extraction process is content selection, in which pages that do not contain CTI-related information should be filtered out. For this task, we apply supervised machine learning-based text classification techniques, trained on a new dataset created for the purposes of this work. Furthermore, we show in practice the importance of a good content selection process in a commonly used CTI extraction pipeline, by inspecting the results of the Named Entity Recognition (NER) process that normally follows.","PeriodicalId":253300,"journal":{"name":"2021 IEEE International Conference on Cyber Security and Resilience (CSR)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Towards Selecting Informative Content for Cyber Threat Intelligence\",\"authors\":\"Panos Panagiotou, Christos Iliou, Konstantinos Apostolou, T. Tsikrika, S. Vrochidis, P. Chatzimisios, I. Kompatsiaris\",\"doi\":\"10.1109/CSR51186.2021.9527909\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, there is an increasing need for cyber security professionals to make use of tools that automatically extract Cyber Threat Intelligence (CTI) relying on information collected from relevant blogs and news sources that are publicly available. When such sources are used, an important part of the CTI extraction process is content selection, in which pages that do not contain CTI-related information should be filtered out. For this task, we apply supervised machine learning-based text classification techniques, trained on a new dataset created for the purposes of this work. Furthermore, we show in practice the importance of a good content selection process in a commonly used CTI extraction pipeline, by inspecting the results of the Named Entity Recognition (NER) process that normally follows.\",\"PeriodicalId\":253300,\"journal\":{\"name\":\"2021 IEEE International Conference on Cyber Security and Resilience (CSR)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Cyber Security and Resilience (CSR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSR51186.2021.9527909\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Cyber Security and Resilience (CSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSR51186.2021.9527909","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

如今，越来越多的网络安全专业人员需要利用自动提取网络威胁情报(CTI)的工具，这些工具依赖于从公开的相关博客和新闻来源收集的信息。当使用这些源时，CTI提取过程的一个重要部分是内容选择，其中不包含CTI相关信息的页面应被过滤掉。对于这项任务，我们应用基于监督机器学习的文本分类技术，在为此工作目的创建的新数据集上进行训练。此外，我们通过检查通常遵循的命名实体识别(NER)过程的结果，在实践中展示了在常用的CTI提取管道中良好内容选择过程的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Selecting Informative Content for Cyber Threat Intelligence

Nowadays, there is an increasing need for cyber security professionals to make use of tools that automatically extract Cyber Threat Intelligence (CTI) relying on information collected from relevant blogs and news sources that are publicly available. When such sources are used, an important part of the CTI extraction process is content selection, in which pages that do not contain CTI-related information should be filtered out. For this task, we apply supervised machine learning-based text classification techniques, trained on a new dataset created for the purposes of this work. Furthermore, we show in practice the importance of a good content selection process in a commonly used CTI extraction pipeline, by inspecting the results of the Named Entity Recognition (NER) process that normally follows.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Cyber Security and Resilience (CSR)

自引率

0.00%

发文量