利用预训练语言模型、嵌入式蒸馏和上采样策略提高 CTC 的非自回归翻译质量

IF 5.1 2区计算机科学 Q1 ACOUSTICS

IEEE/ACM Transactions on Audio, Speech, and Language Processing Pub Date : 2024-09-12 DOI:10.1109/TASLP.2024.3451977

Shen-sian Syu;Juncheng Xie;Hung-yi Lee

{"title":"利用预训练语言模型、嵌入式蒸馏和上采样策略提高 CTC 的非自回归翻译质量","authors":"Shen-sian Syu;Juncheng Xie;Hung-yi Lee","doi":"10.1109/TASLP.2024.3451977","DOIUrl":null,"url":null,"abstract":"Non-autoregressive approaches, especially those that generate output in a one-pass forward manner, have shown great potential in improving the inference speed of translation models. However, these approaches often suffer from a significant drop in translation quality compared to autoregressive models (AT). To tackle this challenge, this paper introduces a series of innovative techniques to enhance the translation quality of non-autoregressive neural machine translation (NAT) models while still maintaining a substantial acceleration in inference speed. Specifically, we propose a method called CTCPMLM, which involves fine-tuning Pretrained Multilingual Language Models (PMLMs) with the Connectionist Temporal Classification (CTC) loss to effectively train NAT models. Additionally, we adopt the MASK insertion scheme instead of token duplication for up-sampling and present an embedding distillation method to further enhance the performance of NAT models. In our experiments, CTCPMLM surpasses the performance of the baseline autoregressive model (Transformer \n<italic>base</i>\n) on various datasets, including WMT'14 DE \n<inline-formula><tex-math>$\\leftrightarrow$</tex-math></inline-formula>\n EN, WMT'16 RO \n<inline-formula><tex-math>$\\leftrightarrow$</tex-math></inline-formula>\n EN, and IWSLT'14 DE \n<inline-formula><tex-math>$\\leftrightarrow$</tex-math></inline-formula>\n EN. Moreover, CTCPMLM represents the current state-of-the-art among NAT models. Notably, our model achieves superior results compared to the baseline autoregressive model on the IWSLT'14 En \n<inline-formula><tex-math>$\\leftrightarrow$</tex-math></inline-formula>\n De and WMT'16 En \n<inline-formula><tex-math>$\\leftrightarrow$</tex-math></inline-formula>\n Ro datasets, even without using distillation data during training. Particularly, on the IWSLT'14 DE \n<inline-formula><tex-math>$\\rightarrow$</tex-math></inline-formula>\n EN dataset, our model achieves an impressive BLEU score of 39.93, surpassing AT models and establishing a new state-of-the-art. Additionally, our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4121-4133"},"PeriodicalIF":5.1000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Non-Autoregressive Translation Quality With Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC\",\"authors\":\"Shen-sian Syu;Juncheng Xie;Hung-yi Lee\",\"doi\":\"10.1109/TASLP.2024.3451977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-autoregressive approaches, especially those that generate output in a one-pass forward manner, have shown great potential in improving the inference speed of translation models. However, these approaches often suffer from a significant drop in translation quality compared to autoregressive models (AT). To tackle this challenge, this paper introduces a series of innovative techniques to enhance the translation quality of non-autoregressive neural machine translation (NAT) models while still maintaining a substantial acceleration in inference speed. Specifically, we propose a method called CTCPMLM, which involves fine-tuning Pretrained Multilingual Language Models (PMLMs) with the Connectionist Temporal Classification (CTC) loss to effectively train NAT models. Additionally, we adopt the MASK insertion scheme instead of token duplication for up-sampling and present an embedding distillation method to further enhance the performance of NAT models. In our experiments, CTCPMLM surpasses the performance of the baseline autoregressive model (Transformer \\n<italic>base</i>\\n) on various datasets, including WMT'14 DE \\n<inline-formula><tex-math>$\\\\leftrightarrow$</tex-math></inline-formula>\\n EN, WMT'16 RO \\n<inline-formula><tex-math>$\\\\leftrightarrow$</tex-math></inline-formula>\\n EN, and IWSLT'14 DE \\n<inline-formula><tex-math>$\\\\leftrightarrow$</tex-math></inline-formula>\\n EN. Moreover, CTCPMLM represents the current state-of-the-art among NAT models. Notably, our model achieves superior results compared to the baseline autoregressive model on the IWSLT'14 En \\n<inline-formula><tex-math>$\\\\leftrightarrow$</tex-math></inline-formula>\\n De and WMT'16 En \\n<inline-formula><tex-math>$\\\\leftrightarrow$</tex-math></inline-formula>\\n Ro datasets, even without using distillation data during training. Particularly, on the IWSLT'14 DE \\n<inline-formula><tex-math>$\\\\rightarrow$</tex-math></inline-formula>\\n EN dataset, our model achieves an impressive BLEU score of 39.93, surpassing AT models and establishing a new state-of-the-art. Additionally, our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.\",\"PeriodicalId\":13332,\"journal\":{\"name\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"volume\":\"32 \",\"pages\":\"4121-4133\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10679261/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10679261/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

非自回归方法，特别是那些以单程前向方式生成输出的方法，在提高翻译模型的推理速度方面显示出巨大的潜力。然而，与自回归模型（AT）相比，这些方法的翻译质量往往会大幅下降。为了应对这一挑战，本文介绍了一系列创新技术，以提高非自回归神经机器翻译（NAT）模型的翻译质量，同时仍能保持推理速度的大幅提升。具体来说，我们提出了一种名为 CTCPMLM 的方法，该方法涉及利用连接时序分类（CTC）损失对预处理多语言语言模型（PMLM）进行微调，从而有效地训练 NAT 模型。此外，我们还采用 MASK 插入方案代替标记复制进行上采样，并提出了一种嵌入蒸馏方法，以进一步提高 NAT 模型的性能。在我们的实验中，CTCPMLM 在各种数据集（包括 WMT'14 DE $\leftrightarrow$ EN、WMT'16 RO $\leftrightarrow$ EN 和 IWSLT'14 DE $\leftrightarrow$ EN）上的性能都超过了基准自回归模型（Transformer base）。此外，CTCPMLM 代表了当前 NAT 模型的最先进水平。值得注意的是，与基线自回归模型相比，我们的模型在 IWSLT'14 En $\leftrightarrow$ De 和 WMT'16 En $\leftrightarrow$ Ro 数据集上取得了更好的结果，即使在训练过程中不使用蒸馏数据。特别是在IWSLT'14 DE $\rightarrow$ EN数据集上，我们的模型取得了令人印象深刻的BLEU分数39.93，超过了AT模型，建立了新的先进水平。此外，与自回归模型相比，我们的模型速度显著提高了 16.35 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Non-Autoregressive Translation Quality With Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC

Non-autoregressive approaches, especially those that generate output in a one-pass forward manner, have shown great potential in improving the inference speed of translation models. However, these approaches often suffer from a significant drop in translation quality compared to autoregressive models (AT). To tackle this challenge, this paper introduces a series of innovative techniques to enhance the translation quality of non-autoregressive neural machine translation (NAT) models while still maintaining a substantial acceleration in inference speed. Specifically, we propose a method called CTCPMLM, which involves fine-tuning Pretrained Multilingual Language Models (PMLMs) with the Connectionist Temporal Classification (CTC) loss to effectively train NAT models. Additionally, we adopt the MASK insertion scheme instead of token duplication for up-sampling and present an embedding distillation method to further enhance the performance of NAT models. In our experiments, CTCPMLM surpasses the performance of the baseline autoregressive model (Transformer base ) on various datasets, including WMT'14 DE

$\leftrightarrow$

EN, WMT'16 RO

$\leftrightarrow$

EN, and IWSLT'14 DE

$\leftrightarrow$

EN. Moreover, CTCPMLM represents the current state-of-the-art among NAT models. Notably, our model achieves superior results compared to the baseline autoregressive model on the IWSLT'14 En

$\leftrightarrow$

De and WMT'16 En

$\leftrightarrow$

Ro datasets, even without using distillation data during training. Particularly, on the IWSLT'14 DE

$\rightarrow$

EN dataset, our model achieves an impressive BLEU score of 39.93, surpassing AT models and establishing a new state-of-the-art. Additionally, our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

11.30

自引率

11.10%

发文量

217

期刊介绍： The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.