RLFE-IDS: A framework of Intrusion Detection System based on Retrieval Augmented Generation and Large Language Model

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computer Networks Pub Date : 2025-05-22 DOI:10.1016/j.comnet.2025.111341

Xuewei Li , Zengyang Zheng , Mankun Zhao , Yue Zhao , Lifeng Shi , Baoliang Wang

{"title":"RLFE-IDS: A framework of Intrusion Detection System based on Retrieval Augmented Generation and Large Language Model","authors":"Xuewei Li , Zengyang Zheng , Mankun Zhao , Yue Zhao , Lifeng Shi , Baoliang Wang","doi":"10.1016/j.comnet.2025.111341","DOIUrl":null,"url":null,"abstract":"<div><div>Intrusion Detection Systems (IDS) play a critical role in network security as a key defense measure, often struggle to effectively handle unknown attacks or variations of known attacks. This challenge is exacerbated by the poor generalization of deep learning models. To enhance the adaptability of IDS, this article introduces an innovative framework called LLM-IDS, which explores the feasibility of leveraging Large Language Model (LLMs) for intrusion detection, due to its strong generalization capabilities. However, there is a significant difficulty in deploying LLMs. Moreover, since most LLMs are primarily designed for Natural Language Processing (NLP) tasks, significant differences arise when naively adapting them to intrusion detection tasks. To address them, this article introduces a novel framework called RLFE-IDS, comprising two key modules: Retrieval-Augmented Generation (RAG) and an embedding model called FE-Net. RAG employs a vector database to store network data alongside their corresponding vector representations. Based on the RAG framework, LLMs can be directly called through an Application Programming Interface (API), alleviating the difficulties in its deployment. The embedding model FE-Net, bridges the semantic gap between text data and network data. Upon receiving new network data, RLFE-IDS employs RAG to query the database for the most relevant network data, which is then fed into the LLM to classify. This article validates approach through experiments on four datasets, and deploys RLFE-IDS into the real network environment. Experiments show that the optimal accuracy of LLM-IDS is 99.36%, and that of RLFE-Net is 98.56%. The results demonstrate not only the feasibility of applying LLMs to intrusion detection, but also the robustness and superior performance of RLFE-IDS.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"268 ","pages":"Article 111341"},"PeriodicalIF":4.6000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128625003081","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Intrusion Detection Systems (IDS) play a critical role in network security as a key defense measure, often struggle to effectively handle unknown attacks or variations of known attacks. This challenge is exacerbated by the poor generalization of deep learning models. To enhance the adaptability of IDS, this article introduces an innovative framework called LLM-IDS, which explores the feasibility of leveraging Large Language Model (LLMs) for intrusion detection, due to its strong generalization capabilities. However, there is a significant difficulty in deploying LLMs. Moreover, since most LLMs are primarily designed for Natural Language Processing (NLP) tasks, significant differences arise when naively adapting them to intrusion detection tasks. To address them, this article introduces a novel framework called RLFE-IDS, comprising two key modules: Retrieval-Augmented Generation (RAG) and an embedding model called FE-Net. RAG employs a vector database to store network data alongside their corresponding vector representations. Based on the RAG framework, LLMs can be directly called through an Application Programming Interface (API), alleviating the difficulties in its deployment. The embedding model FE-Net, bridges the semantic gap between text data and network data. Upon receiving new network data, RLFE-IDS employs RAG to query the database for the most relevant network data, which is then fed into the LLM to classify. This article validates approach through experiments on four datasets, and deploys RLFE-IDS into the real network environment. Experiments show that the optimal accuracy of LLM-IDS is 99.36%, and that of RLFE-Net is 98.56%. The results demonstrate not only the feasibility of applying LLMs to intrusion detection, but also the robustness and superior performance of RLFE-IDS.

查看原文本刊更多论文

基于检索增强生成和大语言模型的入侵检测系统框架

入侵检测系统（IDS）作为一种关键的防御措施，在网络安全中起着至关重要的作用，通常难以有效地处理未知攻击或已知攻击的变种。深度学习模型的糟糕泛化加剧了这一挑战。为了增强入侵检测的适应性，本文引入了一个名为LLM-IDS的创新框架，该框架探索了利用大型语言模型（llm）进行入侵检测的可行性，因为llm具有很强的泛化能力。然而，在部署llm方面有一个很大的困难。此外，由于大多数llm主要是为自然语言处理（NLP）任务设计的，因此当天真地将它们用于入侵检测任务时，会出现显著差异。为了解决这些问题，本文介绍了一个名为RLFE-IDS的新框架，它包括两个关键模块：检索增强生成（retrieve - augmented Generation， RAG）和一个名为FE-Net的嵌入模型。RAG使用一个矢量数据库来存储网络数据及其相应的矢量表示。基于RAG框架，llm可以通过API （Application Programming Interface）直接调用，降低了llm的部署难度。嵌入模型FE-Net，弥补了文本数据和网络数据之间的语义差距。当接收到新的网络数据时，rlife - ids使用RAG向数据库查询最相关的网络数据，然后将这些数据输入LLM进行分类。本文通过在四个数据集上的实验验证了该方法，并将rlife - ids部署到真实的网络环境中。实验表明，LLM-IDS和RLFE-Net的最优准确率分别为99.36%和98.56%。研究结果不仅证明了llm用于入侵检测的可行性，而且表明了rlife - ids的鲁棒性和优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Networks 工程技术-电信学

CiteScore

10.80

自引率

3.60%

发文量

434

审稿时长

8.6 months

期刊介绍： Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.