{"title":"Semformer:具有语义规划功能的转换器语言模型","authors":"Yongjing Yin, Junran Ding, Kai Song, Yue Zhang","doi":"arxiv-2409.11143","DOIUrl":null,"url":null,"abstract":"Next-token prediction serves as the dominant component in current neural\nlanguage models. During the training phase, the model employs teacher forcing,\nwhich predicts tokens based on all preceding ground truth tokens. However, this\napproach has been found to create shortcuts, utilizing the revealed prefix to\nspuriously fit future tokens, potentially compromising the accuracy of the\nnext-token predictor. In this paper, we introduce Semformer, a novel method of\ntraining a Transformer language model that explicitly models the semantic\nplanning of response. Specifically, we incorporate a sequence of planning\ntokens into the prefix, guiding the planning token representations to predict\nthe latent semantic representations of the response, which are induced by an\nautoencoder. In a minimal planning task (i.e., graph path-finding), our model\nexhibits near-perfect performance and effectively mitigates shortcut learning,\na feat that standard training methods and baseline models have been unable to\naccomplish. Furthermore, we pretrain Semformer from scratch with 125M\nparameters, demonstrating its efficacy through measures of perplexity,\nin-context learning, and fine-tuning on summarization tasks.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semformer: Transformer Language Models with Semantic Planning\",\"authors\":\"Yongjing Yin, Junran Ding, Kai Song, Yue Zhang\",\"doi\":\"arxiv-2409.11143\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Next-token prediction serves as the dominant component in current neural\\nlanguage models. During the training phase, the model employs teacher forcing,\\nwhich predicts tokens based on all preceding ground truth tokens. However, this\\napproach has been found to create shortcuts, utilizing the revealed prefix to\\nspuriously fit future tokens, potentially compromising the accuracy of the\\nnext-token predictor. In this paper, we introduce Semformer, a novel method of\\ntraining a Transformer language model that explicitly models the semantic\\nplanning of response. Specifically, we incorporate a sequence of planning\\ntokens into the prefix, guiding the planning token representations to predict\\nthe latent semantic representations of the response, which are induced by an\\nautoencoder. In a minimal planning task (i.e., graph path-finding), our model\\nexhibits near-perfect performance and effectively mitigates shortcut learning,\\na feat that standard training methods and baseline models have been unable to\\naccomplish. Furthermore, we pretrain Semformer from scratch with 125M\\nparameters, demonstrating its efficacy through measures of perplexity,\\nin-context learning, and fine-tuning on summarization tasks.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11143\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semformer: Transformer Language Models with Semantic Planning
Next-token prediction serves as the dominant component in current neural
language models. During the training phase, the model employs teacher forcing,
which predicts tokens based on all preceding ground truth tokens. However, this
approach has been found to create shortcuts, utilizing the revealed prefix to
spuriously fit future tokens, potentially compromising the accuracy of the
next-token predictor. In this paper, we introduce Semformer, a novel method of
training a Transformer language model that explicitly models the semantic
planning of response. Specifically, we incorporate a sequence of planning
tokens into the prefix, guiding the planning token representations to predict
the latent semantic representations of the response, which are induced by an
autoencoder. In a minimal planning task (i.e., graph path-finding), our model
exhibits near-perfect performance and effectively mitigates shortcut learning,
a feat that standard training methods and baseline models have been unable to
accomplish. Furthermore, we pretrain Semformer from scratch with 125M
parameters, demonstrating its efficacy through measures of perplexity,
in-context learning, and fine-tuning on summarization tasks.