语音翻译的思维链提示

Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr Żelasko, Oleksii Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg
{"title":"语音翻译的思维链提示","authors":"Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr Żelasko, Oleksii Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg","doi":"arxiv-2409.11538","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) have demonstrated remarkable advancements in\nlanguage understanding and generation. Building on the success of text-based\nLLMs, recent research has adapted these models to use speech embeddings for\nprompting, resulting in Speech-LLM models that exhibit strong performance in\nautomatic speech recognition (ASR) and automatic speech translation (AST). In\nthis work, we propose a novel approach to leverage ASR transcripts as prompts\nfor AST in a Speech-LLM built on an encoder-decoder text LLM. The Speech-LLM\nmodel consists of a speech encoder and an encoder-decoder structure\nMegatron-T5. By first decoding speech to generate ASR transcripts and\nsubsequently using these transcripts along with encoded speech for prompting,\nwe guide the speech translation in a two-step process like chain-of-thought\n(CoT) prompting. Low-rank adaptation (LoRA) is used for the T5 LLM for model\nadaptation and shows superior performance to full model fine-tuning.\nExperimental results show that the proposed CoT prompting significantly\nimproves AST performance, achieving an average increase of 2.4 BLEU points\nacross 6 En->X or X->En AST tasks compared to speech prompting alone.\nAdditionally, compared to a related CoT prediction method that predicts a\nconcatenated sequence of ASR and AST transcripts, our method performs better by\nan average of 2 BLEU points.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Chain-of-Thought Prompting for Speech Translation\",\"authors\":\"Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr Żelasko, Oleksii Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg\",\"doi\":\"arxiv-2409.11538\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large language models (LLMs) have demonstrated remarkable advancements in\\nlanguage understanding and generation. Building on the success of text-based\\nLLMs, recent research has adapted these models to use speech embeddings for\\nprompting, resulting in Speech-LLM models that exhibit strong performance in\\nautomatic speech recognition (ASR) and automatic speech translation (AST). In\\nthis work, we propose a novel approach to leverage ASR transcripts as prompts\\nfor AST in a Speech-LLM built on an encoder-decoder text LLM. The Speech-LLM\\nmodel consists of a speech encoder and an encoder-decoder structure\\nMegatron-T5. By first decoding speech to generate ASR transcripts and\\nsubsequently using these transcripts along with encoded speech for prompting,\\nwe guide the speech translation in a two-step process like chain-of-thought\\n(CoT) prompting. Low-rank adaptation (LoRA) is used for the T5 LLM for model\\nadaptation and shows superior performance to full model fine-tuning.\\nExperimental results show that the proposed CoT prompting significantly\\nimproves AST performance, achieving an average increase of 2.4 BLEU points\\nacross 6 En->X or X->En AST tasks compared to speech prompting alone.\\nAdditionally, compared to a related CoT prediction method that predicts a\\nconcatenated sequence of ASR and AST transcripts, our method performs better by\\nan average of 2 BLEU points.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11538\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11538","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大语言模型(LLM)在语言理解和生成方面取得了显著进步。在基于文本的大型语言模型取得成功的基础上,最近的研究将这些模型调整为使用语音嵌入进行提示,从而产生了在自动语音识别(ASR)和自动语音翻译(AST)中表现出色的语音大型语言模型。在这项工作中,我们提出了一种新方法,在基于编码器-解码器文本 LLM 的 Speech-LLM 中利用 ASR 转录作为 AST 的提示。语音 LLM 模型由一个语音编码器和一个编码器-解码器结构(Megatron-T5)组成。我们首先对语音进行解码,生成 ASR 转录本,然后使用这些转录本和编码语音进行提示,通过类似于思维链(CoT)提示的两步过程引导语音翻译。实验结果表明,建议的 CoT 提示显著提高了 AST 性能,与单独的语音提示相比,在 6 个 En->X 或 X->En AST 任务中平均提高了 2.4 个 BLEU 点。此外,与预测 ASR 和 AST 转录本合并序列的相关 CoT 预测方法相比,我们的方法平均提高了 2 个 BLEU 点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Chain-of-Thought Prompting for Speech Translation
Large language models (LLMs) have demonstrated remarkable advancements in language understanding and generation. Building on the success of text-based LLMs, recent research has adapted these models to use speech embeddings for prompting, resulting in Speech-LLM models that exhibit strong performance in automatic speech recognition (ASR) and automatic speech translation (AST). In this work, we propose a novel approach to leverage ASR transcripts as prompts for AST in a Speech-LLM built on an encoder-decoder text LLM. The Speech-LLM model consists of a speech encoder and an encoder-decoder structure Megatron-T5. By first decoding speech to generate ASR transcripts and subsequently using these transcripts along with encoded speech for prompting, we guide the speech translation in a two-step process like chain-of-thought (CoT) prompting. Low-rank adaptation (LoRA) is used for the T5 LLM for model adaptation and shows superior performance to full model fine-tuning. Experimental results show that the proposed CoT prompting significantly improves AST performance, achieving an average increase of 2.4 BLEU points across 6 En->X or X->En AST tasks compared to speech prompting alone. Additionally, compared to a related CoT prediction method that predicts a concatenated sequence of ASR and AST transcripts, our method performs better by an average of 2 BLEU points.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信