Qijiong Liu, Jieming Zhu, Lu Fan, Zhou Zhao, Xiao-Ming Wu
{"title":"STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM","authors":"Qijiong Liu, Jieming Zhu, Lu Fan, Zhou Zhao, Xiao-Ming Wu","doi":"arxiv-2409.07276","DOIUrl":null,"url":null,"abstract":"Traditional recommendation models often rely on unique item identifiers (IDs)\nto distinguish between items, which can hinder their ability to effectively\nleverage item content information and generalize to long-tail or cold-start\nitems. Recently, semantic tokenization has been proposed as a promising\nsolution that aims to tokenize each item's semantic representation into a\nsequence of discrete tokens. In this way, it preserves the item's semantics\nwithin these tokens and ensures that semantically similar items are represented\nby similar tokens. These semantic tokens have become fundamental in training\ngenerative recommendation models. However, existing generative recommendation\nmethods typically involve multiple sub-models for embedding, quantization, and\nrecommendation, leading to an overly complex system. In this paper, we propose\nto streamline the semantic tokenization and generative recommendation process\nwith a unified framework, dubbed STORE, which leverages a single large language\nmodel (LLM) for both tasks. Specifically, we formulate semantic tokenization as\na text-to-token task and generative recommendation as a token-to-token task,\nsupplemented by a token-to-text reconstruction task and a text-to-token\nauxiliary task. All these tasks are framed in a generative manner and trained\nusing a single LLM backbone. Extensive experiments have been conducted to\nvalidate the effectiveness of our STORE framework across various recommendation\ntasks and datasets. We will release the source code and configurations for\nreproducible research.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07276","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Traditional recommendation models often rely on unique item identifiers (IDs)
to distinguish between items, which can hinder their ability to effectively
leverage item content information and generalize to long-tail or cold-start
items. Recently, semantic tokenization has been proposed as a promising
solution that aims to tokenize each item's semantic representation into a
sequence of discrete tokens. In this way, it preserves the item's semantics
within these tokens and ensures that semantically similar items are represented
by similar tokens. These semantic tokens have become fundamental in training
generative recommendation models. However, existing generative recommendation
methods typically involve multiple sub-models for embedding, quantization, and
recommendation, leading to an overly complex system. In this paper, we propose
to streamline the semantic tokenization and generative recommendation process
with a unified framework, dubbed STORE, which leverages a single large language
model (LLM) for both tasks. Specifically, we formulate semantic tokenization as
a text-to-token task and generative recommendation as a token-to-token task,
supplemented by a token-to-text reconstruction task and a text-to-token
auxiliary task. All these tasks are framed in a generative manner and trained
using a single LLM backbone. Extensive experiments have been conducted to
validate the effectiveness of our STORE framework across various recommendation
tasks and datasets. We will release the source code and configurations for
reproducible research.