RAGCacheSim：用于评估检索增强生成系统中的缓存策略的离散事件模拟器

IF 1.2 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Impacts Pub Date : 2025-09-01 DOI:10.1016/j.simpa.2025.100783

Hardik Ruparel, Tatsat Patel

{"title":"RAGCacheSim：用于评估检索增强生成系统中的缓存策略的离散事件模拟器","authors":"Hardik Ruparel, Tatsat Patel","doi":"10.1016/j.simpa.2025.100783","DOIUrl":null,"url":null,"abstract":"<div><div>Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) with external knowledge retrieval but incur significant compute and latency costs. In distributed RAG deployments, semantically similar queries routed to different nodes — each with its own cache — can lead to redundant processing. We present <em>RAGCacheSim</em>, a discrete-event simulator for evaluating caching strategies such as Centralized Exact-match Cache (CEC), Independent Semantic Caches (IC), and Distributed Semantic Cache Coordination (DSC). It reports metrics like cache hit rate, average query latency, and coordination overhead. Built using <span>SimPy</span>, <span>FastEmbed</span>, and <span>pybloom_live</span>, it helps researchers optimize distributed RAG architectures.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"26 ","pages":"Article 100783"},"PeriodicalIF":1.2000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RAGCacheSim: A discrete-event simulator for evaluating caching strategies in Retrieval-Augmented Generation systems\",\"authors\":\"Hardik Ruparel, Tatsat Patel\",\"doi\":\"10.1016/j.simpa.2025.100783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) with external knowledge retrieval but incur significant compute and latency costs. In distributed RAG deployments, semantically similar queries routed to different nodes — each with its own cache — can lead to redundant processing. We present <em>RAGCacheSim</em>, a discrete-event simulator for evaluating caching strategies such as Centralized Exact-match Cache (CEC), Independent Semantic Caches (IC), and Distributed Semantic Cache Coordination (DSC). It reports metrics like cache hit rate, average query latency, and coordination overhead. Built using <span>SimPy</span>, <span>FastEmbed</span>, and <span>pybloom_live</span>, it helps researchers optimize distributed RAG architectures.</div></div>\",\"PeriodicalId\":29771,\"journal\":{\"name\":\"Software Impacts\",\"volume\":\"26 \",\"pages\":\"Article 100783\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software Impacts\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2665963825000430\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software Impacts","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2665963825000430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

检索-增强生成（RAG）系统通过外部知识检索来增强大型语言模型（llm），但会产生大量的计算和延迟成本。在分布式RAG部署中，路由到不同节点（每个节点都有自己的缓存）的语义相似的查询可能导致冗余处理。我们提出RAGCacheSim，一个离散事件模拟器，用于评估缓存策略，如集中式精确匹配缓存（CEC），独立语义缓存（IC）和分布式语义缓存协调（DSC）。它报告诸如缓存命中率、平均查询延迟和协调开销等指标。它使用SimPy、FastEmbed和pybloom_live构建，可以帮助研究人员优化分布式RAG架构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

RAGCacheSim: A discrete-event simulator for evaluating caching strategies in Retrieval-Augmented Generation systems

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) with external knowledge retrieval but incur significant compute and latency costs. In distributed RAG deployments, semantically similar queries routed to different nodes — each with its own cache — can lead to redundant processing. We present RAGCacheSim, a discrete-event simulator for evaluating caching strategies such as Centralized Exact-match Cache (CEC), Independent Semantic Caches (IC), and Distributed Semantic Cache Coordination (DSC). It reports metrics like cache hit rate, average query latency, and coordination overhead. Built using SimPy, FastEmbed, and pybloom_live, it helps researchers optimize distributed RAG architectures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Software Impacts Software

CiteScore

2.70

自引率

9.50%

发文量

审稿时长

16 days