Xuewei Li , Zengyang Zheng , Mankun Zhao , Yue Zhao , Lifeng Shi , Baoliang Wang
{"title":"基于检索增强生成和大语言模型的入侵检测系统框架","authors":"Xuewei Li , Zengyang Zheng , Mankun Zhao , Yue Zhao , Lifeng Shi , Baoliang Wang","doi":"10.1016/j.comnet.2025.111341","DOIUrl":null,"url":null,"abstract":"<div><div>Intrusion Detection Systems (IDS) play a critical role in network security as a key defense measure, often struggle to effectively handle unknown attacks or variations of known attacks. This challenge is exacerbated by the poor generalization of deep learning models. To enhance the adaptability of IDS, this article introduces an innovative framework called LLM-IDS, which explores the feasibility of leveraging Large Language Model (LLMs) for intrusion detection, due to its strong generalization capabilities. However, there is a significant difficulty in deploying LLMs. Moreover, since most LLMs are primarily designed for Natural Language Processing (NLP) tasks, significant differences arise when naively adapting them to intrusion detection tasks. To address them, this article introduces a novel framework called RLFE-IDS, comprising two key modules: Retrieval-Augmented Generation (RAG) and an embedding model called FE-Net. RAG employs a vector database to store network data alongside their corresponding vector representations. Based on the RAG framework, LLMs can be directly called through an Application Programming Interface (API), alleviating the difficulties in its deployment. The embedding model FE-Net, bridges the semantic gap between text data and network data. Upon receiving new network data, RLFE-IDS employs RAG to query the database for the most relevant network data, which is then fed into the LLM to classify. This article validates approach through experiments on four datasets, and deploys RLFE-IDS into the real network environment. Experiments show that the optimal accuracy of LLM-IDS is 99.36%, and that of RLFE-Net is 98.56%. The results demonstrate not only the feasibility of applying LLMs to intrusion detection, but also the robustness and superior performance of RLFE-IDS.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"268 ","pages":"Article 111341"},"PeriodicalIF":4.6000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RLFE-IDS: A framework of Intrusion Detection System based on Retrieval Augmented Generation and Large Language Model\",\"authors\":\"Xuewei Li , Zengyang Zheng , Mankun Zhao , Yue Zhao , Lifeng Shi , Baoliang Wang\",\"doi\":\"10.1016/j.comnet.2025.111341\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Intrusion Detection Systems (IDS) play a critical role in network security as a key defense measure, often struggle to effectively handle unknown attacks or variations of known attacks. This challenge is exacerbated by the poor generalization of deep learning models. To enhance the adaptability of IDS, this article introduces an innovative framework called LLM-IDS, which explores the feasibility of leveraging Large Language Model (LLMs) for intrusion detection, due to its strong generalization capabilities. However, there is a significant difficulty in deploying LLMs. Moreover, since most LLMs are primarily designed for Natural Language Processing (NLP) tasks, significant differences arise when naively adapting them to intrusion detection tasks. To address them, this article introduces a novel framework called RLFE-IDS, comprising two key modules: Retrieval-Augmented Generation (RAG) and an embedding model called FE-Net. RAG employs a vector database to store network data alongside their corresponding vector representations. Based on the RAG framework, LLMs can be directly called through an Application Programming Interface (API), alleviating the difficulties in its deployment. The embedding model FE-Net, bridges the semantic gap between text data and network data. Upon receiving new network data, RLFE-IDS employs RAG to query the database for the most relevant network data, which is then fed into the LLM to classify. This article validates approach through experiments on four datasets, and deploys RLFE-IDS into the real network environment. Experiments show that the optimal accuracy of LLM-IDS is 99.36%, and that of RLFE-Net is 98.56%. The results demonstrate not only the feasibility of applying LLMs to intrusion detection, but also the robustness and superior performance of RLFE-IDS.</div></div>\",\"PeriodicalId\":50637,\"journal\":{\"name\":\"Computer Networks\",\"volume\":\"268 \",\"pages\":\"Article 111341\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1389128625003081\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128625003081","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
RLFE-IDS: A framework of Intrusion Detection System based on Retrieval Augmented Generation and Large Language Model
Intrusion Detection Systems (IDS) play a critical role in network security as a key defense measure, often struggle to effectively handle unknown attacks or variations of known attacks. This challenge is exacerbated by the poor generalization of deep learning models. To enhance the adaptability of IDS, this article introduces an innovative framework called LLM-IDS, which explores the feasibility of leveraging Large Language Model (LLMs) for intrusion detection, due to its strong generalization capabilities. However, there is a significant difficulty in deploying LLMs. Moreover, since most LLMs are primarily designed for Natural Language Processing (NLP) tasks, significant differences arise when naively adapting them to intrusion detection tasks. To address them, this article introduces a novel framework called RLFE-IDS, comprising two key modules: Retrieval-Augmented Generation (RAG) and an embedding model called FE-Net. RAG employs a vector database to store network data alongside their corresponding vector representations. Based on the RAG framework, LLMs can be directly called through an Application Programming Interface (API), alleviating the difficulties in its deployment. The embedding model FE-Net, bridges the semantic gap between text data and network data. Upon receiving new network data, RLFE-IDS employs RAG to query the database for the most relevant network data, which is then fed into the LLM to classify. This article validates approach through experiments on four datasets, and deploys RLFE-IDS into the real network environment. Experiments show that the optimal accuracy of LLM-IDS is 99.36%, and that of RLFE-Net is 98.56%. The results demonstrate not only the feasibility of applying LLMs to intrusion detection, but also the robustness and superior performance of RLFE-IDS.
期刊介绍:
Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.