Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji
{"title":"Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking","authors":"Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji","doi":"arxiv-2409.12059","DOIUrl":null,"url":null,"abstract":"Large Language Model can reasonably understand and generate human expressions\nbut may lack of thorough thinking and reasoning mechanisms. Recently there have\nbeen several studies which enhance the thinking ability of language models but\nmost of them are not data-driven or training-based. In this paper, we are\nmotivated by the cognitive mechanism in the natural world, and design a novel\nmodel architecture called TaS which allows it to first consider the thoughts\nand then express the response based upon the query. We design several pipelines\nto annotate or generate the thought contents from prompt-response samples, then\nadd language heads in a middle layer which behaves as the thinking layer. We\ntrain the language model by the thoughts-augmented data and successfully let\nthe thinking layer automatically generate reasonable thoughts and finally\noutput more reasonable responses. Both qualitative examples and quantitative\nresults validate the effectiveness and performance of TaS. Our code is\navailable at https://anonymous.4open.science/r/TadE.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large Language Model can reasonably understand and generate human expressions
but may lack of thorough thinking and reasoning mechanisms. Recently there have
been several studies which enhance the thinking ability of language models but
most of them are not data-driven or training-based. In this paper, we are
motivated by the cognitive mechanism in the natural world, and design a novel
model architecture called TaS which allows it to first consider the thoughts
and then express the response based upon the query. We design several pipelines
to annotate or generate the thought contents from prompt-response samples, then
add language heads in a middle layer which behaves as the thinking layer. We
train the language model by the thoughts-augmented data and successfully let
the thinking layer automatically generate reasonable thoughts and finally
output more reasonable responses. Both qualitative examples and quantitative
results validate the effectiveness and performance of TaS. Our code is
available at https://anonymous.4open.science/r/TadE.
大型语言模型可以合理地理解和生成人类的表达方式,但可能缺乏全面的思考和推理机制。近来有一些增强语言模型思维能力的研究,但大多不是数据驱动或基于训练的。在本文中,我们从自然世界中的认知机制出发,设计了一种名为 TaS 的新型模型架构,使其能够首先考虑思维,然后根据查询表达响应。我们设计了多个管道来注释或生成提示-响应样本中的思维内容,然后在中间层添加语言头,该层就像思维层。通过思维增强数据对语言模型进行训练,成功地让思维层自动生成合理的思维,并最终输出更合理的回复。定性实例和定量结果都验证了 TaS 的有效性和性能。我们的代码见 https://anonymous.4open.science/r/TadE。