{"title":"RAGCacheSim:用于评估检索增强生成系统中的缓存策略的离散事件模拟器","authors":"Hardik Ruparel, Tatsat Patel","doi":"10.1016/j.simpa.2025.100783","DOIUrl":null,"url":null,"abstract":"<div><div>Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) with external knowledge retrieval but incur significant compute and latency costs. In distributed RAG deployments, semantically similar queries routed to different nodes — each with its own cache — can lead to redundant processing. We present <em>RAGCacheSim</em>, a discrete-event simulator for evaluating caching strategies such as Centralized Exact-match Cache (CEC), Independent Semantic Caches (IC), and Distributed Semantic Cache Coordination (DSC). It reports metrics like cache hit rate, average query latency, and coordination overhead. Built using <span>SimPy</span>, <span>FastEmbed</span>, and <span>pybloom_live</span>, it helps researchers optimize distributed RAG architectures.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"26 ","pages":"Article 100783"},"PeriodicalIF":1.2000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RAGCacheSim: A discrete-event simulator for evaluating caching strategies in Retrieval-Augmented Generation systems\",\"authors\":\"Hardik Ruparel, Tatsat Patel\",\"doi\":\"10.1016/j.simpa.2025.100783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) with external knowledge retrieval but incur significant compute and latency costs. In distributed RAG deployments, semantically similar queries routed to different nodes — each with its own cache — can lead to redundant processing. We present <em>RAGCacheSim</em>, a discrete-event simulator for evaluating caching strategies such as Centralized Exact-match Cache (CEC), Independent Semantic Caches (IC), and Distributed Semantic Cache Coordination (DSC). It reports metrics like cache hit rate, average query latency, and coordination overhead. Built using <span>SimPy</span>, <span>FastEmbed</span>, and <span>pybloom_live</span>, it helps researchers optimize distributed RAG architectures.</div></div>\",\"PeriodicalId\":29771,\"journal\":{\"name\":\"Software Impacts\",\"volume\":\"26 \",\"pages\":\"Article 100783\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software Impacts\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2665963825000430\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software Impacts","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2665963825000430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
RAGCacheSim: A discrete-event simulator for evaluating caching strategies in Retrieval-Augmented Generation systems
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) with external knowledge retrieval but incur significant compute and latency costs. In distributed RAG deployments, semantically similar queries routed to different nodes — each with its own cache — can lead to redundant processing. We present RAGCacheSim, a discrete-event simulator for evaluating caching strategies such as Centralized Exact-match Cache (CEC), Independent Semantic Caches (IC), and Distributed Semantic Cache Coordination (DSC). It reports metrics like cache hit rate, average query latency, and coordination overhead. Built using SimPy, FastEmbed, and pybloom_live, it helps researchers optimize distributed RAG architectures.