RAGCacheSim:用于评估检索增强生成系统中的缓存策略的离散事件模拟器

IF 1.2 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Hardik Ruparel, Tatsat Patel
{"title":"RAGCacheSim:用于评估检索增强生成系统中的缓存策略的离散事件模拟器","authors":"Hardik Ruparel,&nbsp;Tatsat Patel","doi":"10.1016/j.simpa.2025.100783","DOIUrl":null,"url":null,"abstract":"<div><div>Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) with external knowledge retrieval but incur significant compute and latency costs. In distributed RAG deployments, semantically similar queries routed to different nodes — each with its own cache — can lead to redundant processing. We present <em>RAGCacheSim</em>, a discrete-event simulator for evaluating caching strategies such as Centralized Exact-match Cache (CEC), Independent Semantic Caches (IC), and Distributed Semantic Cache Coordination (DSC). It reports metrics like cache hit rate, average query latency, and coordination overhead. Built using <span>SimPy</span>, <span>FastEmbed</span>, and <span>pybloom_live</span>, it helps researchers optimize distributed RAG architectures.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"26 ","pages":"Article 100783"},"PeriodicalIF":1.2000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RAGCacheSim: A discrete-event simulator for evaluating caching strategies in Retrieval-Augmented Generation systems\",\"authors\":\"Hardik Ruparel,&nbsp;Tatsat Patel\",\"doi\":\"10.1016/j.simpa.2025.100783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) with external knowledge retrieval but incur significant compute and latency costs. In distributed RAG deployments, semantically similar queries routed to different nodes — each with its own cache — can lead to redundant processing. We present <em>RAGCacheSim</em>, a discrete-event simulator for evaluating caching strategies such as Centralized Exact-match Cache (CEC), Independent Semantic Caches (IC), and Distributed Semantic Cache Coordination (DSC). It reports metrics like cache hit rate, average query latency, and coordination overhead. Built using <span>SimPy</span>, <span>FastEmbed</span>, and <span>pybloom_live</span>, it helps researchers optimize distributed RAG architectures.</div></div>\",\"PeriodicalId\":29771,\"journal\":{\"name\":\"Software Impacts\",\"volume\":\"26 \",\"pages\":\"Article 100783\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software Impacts\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2665963825000430\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software Impacts","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2665963825000430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

检索-增强生成(RAG)系统通过外部知识检索来增强大型语言模型(llm),但会产生大量的计算和延迟成本。在分布式RAG部署中,路由到不同节点(每个节点都有自己的缓存)的语义相似的查询可能导致冗余处理。我们提出RAGCacheSim,一个离散事件模拟器,用于评估缓存策略,如集中式精确匹配缓存(CEC),独立语义缓存(IC)和分布式语义缓存协调(DSC)。它报告诸如缓存命中率、平均查询延迟和协调开销等指标。它使用SimPy、FastEmbed和pybloom_live构建,可以帮助研究人员优化分布式RAG架构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
RAGCacheSim: A discrete-event simulator for evaluating caching strategies in Retrieval-Augmented Generation systems
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) with external knowledge retrieval but incur significant compute and latency costs. In distributed RAG deployments, semantically similar queries routed to different nodes — each with its own cache — can lead to redundant processing. We present RAGCacheSim, a discrete-event simulator for evaluating caching strategies such as Centralized Exact-match Cache (CEC), Independent Semantic Caches (IC), and Distributed Semantic Cache Coordination (DSC). It reports metrics like cache hit rate, average query latency, and coordination overhead. Built using SimPy, FastEmbed, and pybloom_live, it helps researchers optimize distributed RAG architectures.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Software Impacts
Software Impacts Software
CiteScore
2.70
自引率
9.50%
发文量
0
审稿时长
16 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信