利用历史网络事件报告，基于 GPT 的创新型开放源情报

Natural Language Processing Journal Pub Date : 2024-04-24 DOI:10.1016/j.nlp.2024.100074

Fahim Sufi

{"title":"利用历史网络事件报告，基于 GPT 的创新型开放源情报","authors":"Fahim Sufi","doi":"10.1016/j.nlp.2024.100074","DOIUrl":null,"url":null,"abstract":"<div><p>In contemporary discourse, the pervasive influences of Generative Pre-Trained (GPT) and Large Language Models (LLM) are evident, showcasing diverse applications. GPT-based technologies, transcending mere summarization, exhibit adeptness in discerning critical information from extensive textual corpuses. Through prudent extraction of semantically meaningful content from textual representations, GPT technologies engender automated feature extraction, a departure from the fallible manual extraction methodologies. This study posits an innovative paradigm for extracting multidimensional cyber threat-related features from textual depictions of cyber events, leveraging the prowess of GPT. These extracted features serve as inputs for artificial intelligence (AI) and deep learning algorithms, including Convolutional Neural Network (CNN), Decomposition analysis, and Natural Language Processing (NLP)-based modalities tailored for non-technical cyber strategists. The proposed framework empowers cyber strategists or analysts to articulate inquiries regarding historical cyber incidents in plain English, with the NLP-based interaction facet of the system proffering cogent AI-driven insights in natural language. Furthermore, salient insights, often elusive in dynamic visualizations, are succinctly presented in plain language. Empirical validation of the entire system ensued through autonomous acquisition of semantically enriched contextual information concerning 214 major cyber incidents spanning from 2016 to 2023. GPT-based responses on Actor Type, Target, Attack Source (i.e., Country Originating Attack), Attack Destination (i.e., Targeted Country), Attack Level, Attack Type, and Attack Timeline, underwent critical AI-driven analysis. This comprehensive 7-dimensional information gleaned from the corpus of 214 incidents yielded a corpus of 1498 informative outputs, attaining a commendable precision of 96%, a recall rate of 98%, and an F1-Score of 97%.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"7 ","pages":"Article 100074"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000220/pdfft?md5=51fa56bc0f6ecc9df3ea7e02efce3208&pid=1-s2.0-S2949719124000220-main.pdf","citationCount":"0","resultStr":"{\"title\":\"An innovative GPT-based open-source intelligence using historical cyber incident reports\",\"authors\":\"Fahim Sufi\",\"doi\":\"10.1016/j.nlp.2024.100074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In contemporary discourse, the pervasive influences of Generative Pre-Trained (GPT) and Large Language Models (LLM) are evident, showcasing diverse applications. GPT-based technologies, transcending mere summarization, exhibit adeptness in discerning critical information from extensive textual corpuses. Through prudent extraction of semantically meaningful content from textual representations, GPT technologies engender automated feature extraction, a departure from the fallible manual extraction methodologies. This study posits an innovative paradigm for extracting multidimensional cyber threat-related features from textual depictions of cyber events, leveraging the prowess of GPT. These extracted features serve as inputs for artificial intelligence (AI) and deep learning algorithms, including Convolutional Neural Network (CNN), Decomposition analysis, and Natural Language Processing (NLP)-based modalities tailored for non-technical cyber strategists. The proposed framework empowers cyber strategists or analysts to articulate inquiries regarding historical cyber incidents in plain English, with the NLP-based interaction facet of the system proffering cogent AI-driven insights in natural language. Furthermore, salient insights, often elusive in dynamic visualizations, are succinctly presented in plain language. Empirical validation of the entire system ensued through autonomous acquisition of semantically enriched contextual information concerning 214 major cyber incidents spanning from 2016 to 2023. GPT-based responses on Actor Type, Target, Attack Source (i.e., Country Originating Attack), Attack Destination (i.e., Targeted Country), Attack Level, Attack Type, and Attack Timeline, underwent critical AI-driven analysis. This comprehensive 7-dimensional information gleaned from the corpus of 214 incidents yielded a corpus of 1498 informative outputs, attaining a commendable precision of 96%, a recall rate of 98%, and an F1-Score of 97%.</p></div>\",\"PeriodicalId\":100944,\"journal\":{\"name\":\"Natural Language Processing Journal\",\"volume\":\"7 \",\"pages\":\"Article 100074\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949719124000220/pdfft?md5=51fa56bc0f6ecc9df3ea7e02efce3208&pid=1-s2.0-S2949719124000220-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Language Processing Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949719124000220\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在当代话语中，生成预训练（GPT）和大型语言模型（LLM）的影响无处不在，展示出多种多样的应用。基于 GPT 的技术超越了单纯的摘要，在从广泛的文本语料库中辨别关键信息方面表现出了卓越的能力。通过从文本表述中谨慎提取有语义的内容，GPT 技术实现了自动特征提取，一改人工提取方法的弊端。本研究提出了一种创新范式，利用 GPT 的优势，从网络事件的文本描述中提取多维网络威胁相关特征。这些提取的特征可作为人工智能（AI）和深度学习算法的输入，包括卷积神经网络（CNN）、分解分析和基于自然语言处理（NLP）的模式，专为非技术性网络战略家量身定制。所提出的框架使网络战略家或分析师能够用浅显的英语表达对历史网络事件的询问，系统中基于 NLP 的交互部分用自然语言提供了有说服力的人工智能驱动的见解。此外，在动态可视化中往往难以捉摸的突出见解，也能以通俗易懂的语言简明扼要地呈现出来。通过自主获取语义丰富的上下文信息，对整个系统进行了经验验证，这些信息涉及从 2016 年到 2023 年的 214 起重大网络事件。对基于 GPT 的行为者类型、目标、攻击源（即发起攻击的国家）、攻击目的地（即目标国家）、攻击级别、攻击类型和攻击时间线进行了关键的人工智能驱动分析。从 214 起事件的语料库中收集的这 7 个维度的综合信息产生了 1498 个信息输出语料库，精确度达到 96%，召回率达到 98%，F1 分数达到 97%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An innovative GPT-based open-source intelligence using historical cyber incident reports

In contemporary discourse, the pervasive influences of Generative Pre-Trained (GPT) and Large Language Models (LLM) are evident, showcasing diverse applications. GPT-based technologies, transcending mere summarization, exhibit adeptness in discerning critical information from extensive textual corpuses. Through prudent extraction of semantically meaningful content from textual representations, GPT technologies engender automated feature extraction, a departure from the fallible manual extraction methodologies. This study posits an innovative paradigm for extracting multidimensional cyber threat-related features from textual depictions of cyber events, leveraging the prowess of GPT. These extracted features serve as inputs for artificial intelligence (AI) and deep learning algorithms, including Convolutional Neural Network (CNN), Decomposition analysis, and Natural Language Processing (NLP)-based modalities tailored for non-technical cyber strategists. The proposed framework empowers cyber strategists or analysts to articulate inquiries regarding historical cyber incidents in plain English, with the NLP-based interaction facet of the system proffering cogent AI-driven insights in natural language. Furthermore, salient insights, often elusive in dynamic visualizations, are succinctly presented in plain language. Empirical validation of the entire system ensued through autonomous acquisition of semantically enriched contextual information concerning 214 major cyber incidents spanning from 2016 to 2023. GPT-based responses on Actor Type, Target, Attack Source (i.e., Country Originating Attack), Attack Destination (i.e., Targeted Country), Attack Level, Attack Type, and Attack Timeline, underwent critical AI-driven analysis. This comprehensive 7-dimensional information gleaned from the corpus of 214 incidents yielded a corpus of 1498 informative outputs, attaining a commendable precision of 96%, a recall rate of 98%, and an F1-Score of 97%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Natural Language Processing Journal

自引率

0.00%

发文量