Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking

arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI:arxiv-2409.12059

Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji

{"title":"Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking","authors":"Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji","doi":"arxiv-2409.12059","DOIUrl":null,"url":null,"abstract":"Large Language Model can reasonably understand and generate human expressions\nbut may lack of thorough thinking and reasoning mechanisms. Recently there have\nbeen several studies which enhance the thinking ability of language models but\nmost of them are not data-driven or training-based. In this paper, we are\nmotivated by the cognitive mechanism in the natural world, and design a novel\nmodel architecture called TaS which allows it to first consider the thoughts\nand then express the response based upon the query. We design several pipelines\nto annotate or generate the thought contents from prompt-response samples, then\nadd language heads in a middle layer which behaves as the thinking layer. We\ntrain the language model by the thoughts-augmented data and successfully let\nthe thinking layer automatically generate reasonable thoughts and finally\noutput more reasonable responses. Both qualitative examples and quantitative\nresults validate the effectiveness and performance of TaS. Our code is\navailable at https://anonymous.4open.science/r/TadE.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS which allows it to first consider the thoughts and then express the response based upon the query. We design several pipelines to annotate or generate the thought contents from prompt-response samples, then add language heads in a middle layer which behaves as the thinking layer. We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses. Both qualitative examples and quantitative results validate the effectiveness and performance of TaS. Our code is available at https://anonymous.4open.science/r/TadE.

查看原文本刊更多论文

同时思考和说话的大语言模型的双层训练和解码

大型语言模型可以合理地理解和生成人类的表达方式，但可能缺乏全面的思考和推理机制。近来有一些增强语言模型思维能力的研究，但大多不是数据驱动或基于训练的。在本文中，我们从自然世界中的认知机制出发，设计了一种名为 TaS 的新型模型架构，使其能够首先考虑思维，然后根据查询表达响应。我们设计了多个管道来注释或生成提示-响应样本中的思维内容，然后在中间层添加语言头，该层就像思维层。通过思维增强数据对语言模型进行训练，成功地让思维层自动生成合理的思维，并最终输出更合理的回复。定性实例和定量结果都验证了 TaS 的有效性和性能。我们的代码见 https://anonymous.4open.science/r/TadE。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Computation and Language

自引率

0.00%

发文量