{"title":"RAGCacheSim: A discrete-event simulator for evaluating caching strategies in Retrieval-Augmented Generation systems","authors":"Hardik Ruparel, Tatsat Patel","doi":"10.1016/j.simpa.2025.100783","DOIUrl":null,"url":null,"abstract":"<div><div>Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) with external knowledge retrieval but incur significant compute and latency costs. In distributed RAG deployments, semantically similar queries routed to different nodes — each with its own cache — can lead to redundant processing. We present <em>RAGCacheSim</em>, a discrete-event simulator for evaluating caching strategies such as Centralized Exact-match Cache (CEC), Independent Semantic Caches (IC), and Distributed Semantic Cache Coordination (DSC). It reports metrics like cache hit rate, average query latency, and coordination overhead. Built using <span>SimPy</span>, <span>FastEmbed</span>, and <span>pybloom_live</span>, it helps researchers optimize distributed RAG architectures.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"26 ","pages":"Article 100783"},"PeriodicalIF":1.2000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software Impacts","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2665963825000430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) with external knowledge retrieval but incur significant compute and latency costs. In distributed RAG deployments, semantically similar queries routed to different nodes — each with its own cache — can lead to redundant processing. We present RAGCacheSim, a discrete-event simulator for evaluating caching strategies such as Centralized Exact-match Cache (CEC), Independent Semantic Caches (IC), and Distributed Semantic Cache Coordination (DSC). It reports metrics like cache hit rate, average query latency, and coordination overhead. Built using SimPy, FastEmbed, and pybloom_live, it helps researchers optimize distributed RAG architectures.