{"title":"利用历史网络事件报告,基于 GPT 的创新型开放源情报","authors":"Fahim Sufi","doi":"10.1016/j.nlp.2024.100074","DOIUrl":null,"url":null,"abstract":"<div><p>In contemporary discourse, the pervasive influences of Generative Pre-Trained (GPT) and Large Language Models (LLM) are evident, showcasing diverse applications. GPT-based technologies, transcending mere summarization, exhibit adeptness in discerning critical information from extensive textual corpuses. Through prudent extraction of semantically meaningful content from textual representations, GPT technologies engender automated feature extraction, a departure from the fallible manual extraction methodologies. This study posits an innovative paradigm for extracting multidimensional cyber threat-related features from textual depictions of cyber events, leveraging the prowess of GPT. These extracted features serve as inputs for artificial intelligence (AI) and deep learning algorithms, including Convolutional Neural Network (CNN), Decomposition analysis, and Natural Language Processing (NLP)-based modalities tailored for non-technical cyber strategists. The proposed framework empowers cyber strategists or analysts to articulate inquiries regarding historical cyber incidents in plain English, with the NLP-based interaction facet of the system proffering cogent AI-driven insights in natural language. Furthermore, salient insights, often elusive in dynamic visualizations, are succinctly presented in plain language. Empirical validation of the entire system ensued through autonomous acquisition of semantically enriched contextual information concerning 214 major cyber incidents spanning from 2016 to 2023. GPT-based responses on Actor Type, Target, Attack Source (i.e., Country Originating Attack), Attack Destination (i.e., Targeted Country), Attack Level, Attack Type, and Attack Timeline, underwent critical AI-driven analysis. This comprehensive 7-dimensional information gleaned from the corpus of 214 incidents yielded a corpus of 1498 informative outputs, attaining a commendable precision of 96%, a recall rate of 98%, and an F1-Score of 97%.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"7 ","pages":"Article 100074"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000220/pdfft?md5=51fa56bc0f6ecc9df3ea7e02efce3208&pid=1-s2.0-S2949719124000220-main.pdf","citationCount":"0","resultStr":"{\"title\":\"An innovative GPT-based open-source intelligence using historical cyber incident reports\",\"authors\":\"Fahim Sufi\",\"doi\":\"10.1016/j.nlp.2024.100074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In contemporary discourse, the pervasive influences of Generative Pre-Trained (GPT) and Large Language Models (LLM) are evident, showcasing diverse applications. GPT-based technologies, transcending mere summarization, exhibit adeptness in discerning critical information from extensive textual corpuses. Through prudent extraction of semantically meaningful content from textual representations, GPT technologies engender automated feature extraction, a departure from the fallible manual extraction methodologies. This study posits an innovative paradigm for extracting multidimensional cyber threat-related features from textual depictions of cyber events, leveraging the prowess of GPT. These extracted features serve as inputs for artificial intelligence (AI) and deep learning algorithms, including Convolutional Neural Network (CNN), Decomposition analysis, and Natural Language Processing (NLP)-based modalities tailored for non-technical cyber strategists. The proposed framework empowers cyber strategists or analysts to articulate inquiries regarding historical cyber incidents in plain English, with the NLP-based interaction facet of the system proffering cogent AI-driven insights in natural language. Furthermore, salient insights, often elusive in dynamic visualizations, are succinctly presented in plain language. Empirical validation of the entire system ensued through autonomous acquisition of semantically enriched contextual information concerning 214 major cyber incidents spanning from 2016 to 2023. GPT-based responses on Actor Type, Target, Attack Source (i.e., Country Originating Attack), Attack Destination (i.e., Targeted Country), Attack Level, Attack Type, and Attack Timeline, underwent critical AI-driven analysis. This comprehensive 7-dimensional information gleaned from the corpus of 214 incidents yielded a corpus of 1498 informative outputs, attaining a commendable precision of 96%, a recall rate of 98%, and an F1-Score of 97%.</p></div>\",\"PeriodicalId\":100944,\"journal\":{\"name\":\"Natural Language Processing Journal\",\"volume\":\"7 \",\"pages\":\"Article 100074\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949719124000220/pdfft?md5=51fa56bc0f6ecc9df3ea7e02efce3208&pid=1-s2.0-S2949719124000220-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Language Processing Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949719124000220\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An innovative GPT-based open-source intelligence using historical cyber incident reports
In contemporary discourse, the pervasive influences of Generative Pre-Trained (GPT) and Large Language Models (LLM) are evident, showcasing diverse applications. GPT-based technologies, transcending mere summarization, exhibit adeptness in discerning critical information from extensive textual corpuses. Through prudent extraction of semantically meaningful content from textual representations, GPT technologies engender automated feature extraction, a departure from the fallible manual extraction methodologies. This study posits an innovative paradigm for extracting multidimensional cyber threat-related features from textual depictions of cyber events, leveraging the prowess of GPT. These extracted features serve as inputs for artificial intelligence (AI) and deep learning algorithms, including Convolutional Neural Network (CNN), Decomposition analysis, and Natural Language Processing (NLP)-based modalities tailored for non-technical cyber strategists. The proposed framework empowers cyber strategists or analysts to articulate inquiries regarding historical cyber incidents in plain English, with the NLP-based interaction facet of the system proffering cogent AI-driven insights in natural language. Furthermore, salient insights, often elusive in dynamic visualizations, are succinctly presented in plain language. Empirical validation of the entire system ensued through autonomous acquisition of semantically enriched contextual information concerning 214 major cyber incidents spanning from 2016 to 2023. GPT-based responses on Actor Type, Target, Attack Source (i.e., Country Originating Attack), Attack Destination (i.e., Targeted Country), Attack Level, Attack Type, and Attack Timeline, underwent critical AI-driven analysis. This comprehensive 7-dimensional information gleaned from the corpus of 214 incidents yielded a corpus of 1498 informative outputs, attaining a commendable precision of 96%, a recall rate of 98%, and an F1-Score of 97%.