CANAL - 网络活动新闻预警语言模型：经验方法与昂贵的 LLMs 比较

2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC) Pub Date : 2024-02-07 DOI:10.1109/ICAIC60265.2024.10433839

Urjitkumar Patel, Fang-Chun Yeh, Chinmay Gondhalekar

{"title":"CANAL - 网络活动新闻预警语言模型：经验方法与昂贵的 LLMs 比较","authors":"Urjitkumar Patel, Fang-Chun Yeh, Chinmay Gondhalekar","doi":"10.1109/ICAIC60265.2024.10433839","DOIUrl":null,"url":null,"abstract":"In today’s digital landscape, where cyber attacks have become the norm, the detection of cyber attacks and threats is critically imperative across diverse domains. Our research presents a new empirical framework for cyber threat modeling, adept at parsing and categorizing cyber-related information from news articles, enhancing real-time vigilance for market stakeholders. At the core of this framework is a fine-tuned BERT model, which we call CANAL - Cyber Activity News Alerting Language Model, tailored for cyber categorization using a novel silver labeling approach powered by Random Forest. We benchmark CANAL against larger, costlier LLMs, including GPT-4, LLaMA, and Zephyr, highlighting their zero to few-shot learning in cyber news classification. CANAL demonstrates superior performance by outperforming all other LLM counterparts in both accuracy and cost-effectiveness. Furthermore, we introduce the Cyber Signal Discovery module, a strategic component designed to efficiently detect emerging cyber signals from news articles. Collectively, CANAL and Cyber Signal Discovery module equip our framework to provide a robust and cost-effective solution for businesses that require agile responses to cyber intelligence.","PeriodicalId":517265,"journal":{"name":"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)","volume":"24 6","pages":"1-12"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CANAL - Cyber Activity News Alerting Language Model : Empirical Approach vs. Expensive LLMs\",\"authors\":\"Urjitkumar Patel, Fang-Chun Yeh, Chinmay Gondhalekar\",\"doi\":\"10.1109/ICAIC60265.2024.10433839\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In today’s digital landscape, where cyber attacks have become the norm, the detection of cyber attacks and threats is critically imperative across diverse domains. Our research presents a new empirical framework for cyber threat modeling, adept at parsing and categorizing cyber-related information from news articles, enhancing real-time vigilance for market stakeholders. At the core of this framework is a fine-tuned BERT model, which we call CANAL - Cyber Activity News Alerting Language Model, tailored for cyber categorization using a novel silver labeling approach powered by Random Forest. We benchmark CANAL against larger, costlier LLMs, including GPT-4, LLaMA, and Zephyr, highlighting their zero to few-shot learning in cyber news classification. CANAL demonstrates superior performance by outperforming all other LLM counterparts in both accuracy and cost-effectiveness. Furthermore, we introduce the Cyber Signal Discovery module, a strategic component designed to efficiently detect emerging cyber signals from news articles. Collectively, CANAL and Cyber Signal Discovery module equip our framework to provide a robust and cost-effective solution for businesses that require agile responses to cyber intelligence.\",\"PeriodicalId\":517265,\"journal\":{\"name\":\"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)\",\"volume\":\"24 6\",\"pages\":\"1-12\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIC60265.2024.10433839\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIC60265.2024.10433839","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在当今的数字环境中，网络攻击已成为常态，因此在不同领域检测网络攻击和威胁至关重要。我们的研究为网络威胁建模提出了一个新的经验框架，该框架善于解析和分类新闻报道中的网络相关信息，提高市场利益相关者的实时警惕性。该框架的核心是一个经过微调的 BERT 模型，我们称之为 CANAL - 网络活动新闻预警语言模型，它采用随机森林（Random Forest）驱动的新型银标签方法，专为网络分类量身定制。我们将 CANAL 与更大型、成本更高的 LLM（包括 GPT-4、LLaMA 和 Zephyr）进行比较，突出它们在网络新闻分类中从零到几的学习能力。CANAL 在准确性和成本效益方面都优于所有其他 LLM，表现出了卓越的性能。此外，我们还介绍了网络信号发现模块，这是一个战略性组件，旨在从新闻文章中有效地发现新出现的网络信号。总之，CANAL 和网络信号发现模块使我们的框架能够为需要对网络情报做出敏捷反应的企业提供强大而经济高效的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CANAL - Cyber Activity News Alerting Language Model : Empirical Approach vs. Expensive LLMs

In today’s digital landscape, where cyber attacks have become the norm, the detection of cyber attacks and threats is critically imperative across diverse domains. Our research presents a new empirical framework for cyber threat modeling, adept at parsing and categorizing cyber-related information from news articles, enhancing real-time vigilance for market stakeholders. At the core of this framework is a fine-tuned BERT model, which we call CANAL - Cyber Activity News Alerting Language Model, tailored for cyber categorization using a novel silver labeling approach powered by Random Forest. We benchmark CANAL against larger, costlier LLMs, including GPT-4, LLaMA, and Zephyr, highlighting their zero to few-shot learning in cyber news classification. CANAL demonstrates superior performance by outperforming all other LLM counterparts in both accuracy and cost-effectiveness. Furthermore, we introduce the Cyber Signal Discovery module, a strategic component designed to efficiently detect emerging cyber signals from news articles. Collectively, CANAL and Cyber Signal Discovery module equip our framework to provide a robust and cost-effective solution for businesses that require agile responses to cyber intelligence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)

自引率

0.00%

发文量