{"title":"CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding","authors":"Sophia Ho, Jinsol Park, Patrick Wang","doi":"arxiv-2408.04678","DOIUrl":null,"url":null,"abstract":"We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign\nof REST that allows it to be effectively \"compacted\". REST is a drafting\ntechnique for speculative decoding based on retrieving exact n-gram matches of\nthe most recent n tokens generated by the target LLM from a datastore. The key\nidea of CREST is to only store a subset of the smallest and most common n-grams\nin the datastore with the hope of achieving comparable performance with less\nstorage space. We found that storing a subset of n-grams both reduces storage\nspace and improves performance. CREST matches REST's accepted token length with\n10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance\nlength than REST using the same storage space on the HumanEval and MT Bench\nbenchmarks.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign
of REST that allows it to be effectively "compacted". REST is a drafting
technique for speculative decoding based on retrieving exact n-gram matches of
the most recent n tokens generated by the target LLM from a datastore. The key
idea of CREST is to only store a subset of the smallest and most common n-grams
in the datastore with the hope of achieving comparable performance with less
storage space. We found that storing a subset of n-grams both reduces storage
space and improves performance. CREST matches REST's accepted token length with
10.6-13.5x less storage space and achieves a 16.5-17.1% higher acceptance
length than REST using the same storage space on the HumanEval and MT Bench
benchmarks.