NLP-based techniques for Cyber Threat Intelligence

IF 12.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computer Science Review Pub Date : 2025-06-03 DOI:10.1016/j.cosrev.2025.100765

Marco Arazzi , Dincy R. Arikkat , Serena Nicolazzo , Antonino Nocera , Rafidha Rehiman K.A. , Vinod P. , Mauro Conti

{"title":"NLP-based techniques for Cyber Threat Intelligence","authors":"Marco Arazzi , Dincy R. Arikkat , Serena Nicolazzo , Antonino Nocera , Rafidha Rehiman K.A. , Vinod P. , Mauro Conti","doi":"10.1016/j.cosrev.2025.100765","DOIUrl":null,"url":null,"abstract":"<div><div>In the digital era, threat actors employ sophisticated techniques for which, often, digital traces in the form of textual data are available. Cyber Threat Intelligence (CTI) is related to all the solutions inherent to data collection, processing, and analysis useful to understand a threat actor’s targets and attack behavior. Currently, CTI is assuming an always more crucial role in identifying and mitigating threats and enabling proactive defense strategies. In this context, NLP, an artificial intelligence branch, has emerged as a powerful tool for enhancing threat intelligence capabilities. This survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence. It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets. It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, security threats of CTI, and role of LLM in this domain. Finally, the challenges and limitations of NLP in threat intelligence are exhaustively examined, including data quality issues and ethical considerations. This survey draws a complete framework and serves as a valuable resource for security professionals and researchers seeking to understand the state-of-the-art NLP-based threat intelligence techniques and their potential impact on cybersecurity.</div></div>","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":"58 ","pages":"Article 100765"},"PeriodicalIF":12.7000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574013725000413","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In the digital era, threat actors employ sophisticated techniques for which, often, digital traces in the form of textual data are available. Cyber Threat Intelligence (CTI) is related to all the solutions inherent to data collection, processing, and analysis useful to understand a threat actor’s targets and attack behavior. Currently, CTI is assuming an always more crucial role in identifying and mitigating threats and enabling proactive defense strategies. In this context, NLP, an artificial intelligence branch, has emerged as a powerful tool for enhancing threat intelligence capabilities. This survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence. It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets. It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, security threats of CTI, and role of LLM in this domain. Finally, the challenges and limitations of NLP in threat intelligence are exhaustively examined, including data quality issues and ethical considerations. This survey draws a complete framework and serves as a valuable resource for security professionals and researchers seeking to understand the state-of-the-art NLP-based threat intelligence techniques and their potential impact on cybersecurity.

查看原文本刊更多论文

基于nlp的网络威胁情报技术

在数字时代，威胁行为者采用复杂的技术，通常可以获得文本数据形式的数字痕迹。网络威胁情报（CTI）涉及到所有固有的数据收集、处理和分析解决方案，这些解决方案有助于了解威胁参与者的目标和攻击行为。目前，CTI在识别和减轻威胁以及实现主动防御策略方面发挥着越来越重要的作用。在这种背景下，NLP作为人工智能的一个分支，成为增强威胁情报能力的有力工具。这篇调查论文提供了在威胁情报背景下应用的基于nlp的技术的全面概述。它首先描述了CTI作为保护数字资产的主要工具的基本定义和原则。然后对基于nlp的CTI数据抓取技术、CTI数据分析技术、网络安全数据的关系提取技术、CTI共享和协作技术、CTI的安全威胁技术以及LLM在该领域的作用进行了全面的研究。最后，对NLP在威胁情报中的挑战和局限性进行了详尽的研究，包括数据质量问题和伦理考虑。该调查绘制了一个完整的框架，并为寻求了解最先进的基于nlp的威胁情报技术及其对网络安全的潜在影响的安全专业人员和研究人员提供了宝贵的资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Science Review Computer Science-General Computer Science

CiteScore

32.70

自引率

0.00%

发文量

审稿时长

51 days

期刊介绍： Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.