CREST：有效压缩数据存储，实现基于检索的推测性解码

arXiv - CS - Databases Pub Date : 2024-08-08 DOI:arxiv-2408.04678

Sophia Ho, Jinsol Park, Patrick Wang

{"title":"CREST：有效压缩数据存储，实现基于检索的推测性解码","authors":"Sophia Ho, Jinsol Park, Patrick Wang","doi":"arxiv-2408.04678","DOIUrl":null,"url":null,"abstract":"We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign\nof REST that allows it to be effectively \"compacted\". REST is a drafting\ntechnique for speculative decoding based on retrieving exact n-gram matches of\nthe most recent n tokens generated by the target LLM from a datastore. The key\nidea of CREST is to only store a subset of the smallest and most common n-grams\nin the datastore with the hope of achieving comparable performance with less\nstorage space. We found that storing a subset of n-grams both reduces storage\nspace and improves performance. CREST matches REST's accepted token length with\n10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance\nlength than REST using the same storage space on the HumanEval and MT Bench\nbenchmarks.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding\",\"authors\":\"Sophia Ho, Jinsol Park, Patrick Wang\",\"doi\":\"arxiv-2408.04678\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign\\nof REST that allows it to be effectively \\\"compacted\\\". REST is a drafting\\ntechnique for speculative decoding based on retrieving exact n-gram matches of\\nthe most recent n tokens generated by the target LLM from a datastore. The key\\nidea of CREST is to only store a subset of the smallest and most common n-grams\\nin the datastore with the hope of achieving comparable performance with less\\nstorage space. We found that storing a subset of n-grams both reduces storage\\nspace and improves performance. CREST matches REST's accepted token length with\\n10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance\\nlength than REST using the same storage space on the HumanEval and MT Bench\\nbenchmarks.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04678\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了 CREST（基于紧凑检索的推测性解码），它是对 REST 的重新设计，可以有效地将其 "紧凑化"。REST 是一种用于推测解码的起草技术，它基于从数据存储中检索目标 LLM 最近生成的 n 个词组的精确 n-gram 匹配。CREST 的关键理念是在数据存储中只存储最小和最常见的 n 个词组的子集，希望以较少的存储空间实现相当的性能。我们发现，存储 n-grams 的子集既能减少存储空间，又能提高性能。在 HumanEval 和 MT Benchbenchmarks 上，CREST 用 10.6-13.5 倍的存储空间达到了 REST 的可接受标记长度，用相同的存储空间实现了比 REST 高 16.5-17.1% 的可接受长度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding

We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign of REST that allows it to be effectively "compacted". REST is a drafting technique for speculative decoding based on retrieving exact n-gram matches of the most recent n tokens generated by the target LLM from a datastore. The key idea of CREST is to only store a subset of the smallest and most common n-grams in the datastore with the hope of achieving comparable performance with less storage space. We found that storing a subset of n-grams both reduces storage space and improves performance. CREST matches REST's accepted token length with 10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance length than REST using the same storage space on the HumanEval and MT Bench benchmarks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Databases

自引率

0.00%

发文量