基于多粒度对齐的跨模态迁移学习用于端到端口语理解

Yi Zhu, Zexun Wang, Hang Liu, Pei-Hsin Wang, Mingchao Feng, Meng Chen, Xiaodong He
{"title":"基于多粒度对齐的跨模态迁移学习用于端到端口语理解","authors":"Yi Zhu, Zexun Wang, Hang Liu, Pei-Hsin Wang, Mingchao Feng, Meng Chen, Xiaodong He","doi":"10.21437/interspeech.2022-11378","DOIUrl":null,"url":null,"abstract":"End-to-end spoken language understanding (E2E-SLU) has witnessed impressive improvements through cross-modal (text-to-audio) transfer learning. However, current methods mostly focus on coarse-grained sequence-level text-to-audio knowledge transfer with simple loss, and neglecting the fine-grained temporal alignment between the two modalities. In this work, we propose a novel multi-grained cross-modal transfer learning framework for E2E-SLU. Specifically, we devise a cross attention module to align the tokens of text with the frame features of speech, encouraging the model to target at the salient acoustic features attended to each token during transferring the semantic information. We also leverage contrastive learning to facilitate cross-modal representation learning in sentence level. Finally, we explore various data augmentation methods to mitigate the deficiency of large amount of labelled data for the training of E2E-SLU. Extensive experiments are conducted on both English and Chinese SLU datasets to verify the effectiveness of our proposed approach. Experimental results and detailed analyses demonstrate the superiority and competitiveness of our model.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"1131-1135"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Cross-modal Transfer Learning via Multi-grained Alignment for End-to-End Spoken Language Understanding\",\"authors\":\"Yi Zhu, Zexun Wang, Hang Liu, Pei-Hsin Wang, Mingchao Feng, Meng Chen, Xiaodong He\",\"doi\":\"10.21437/interspeech.2022-11378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"End-to-end spoken language understanding (E2E-SLU) has witnessed impressive improvements through cross-modal (text-to-audio) transfer learning. However, current methods mostly focus on coarse-grained sequence-level text-to-audio knowledge transfer with simple loss, and neglecting the fine-grained temporal alignment between the two modalities. In this work, we propose a novel multi-grained cross-modal transfer learning framework for E2E-SLU. Specifically, we devise a cross attention module to align the tokens of text with the frame features of speech, encouraging the model to target at the salient acoustic features attended to each token during transferring the semantic information. We also leverage contrastive learning to facilitate cross-modal representation learning in sentence level. Finally, we explore various data augmentation methods to mitigate the deficiency of large amount of labelled data for the training of E2E-SLU. Extensive experiments are conducted on both English and Chinese SLU datasets to verify the effectiveness of our proposed approach. Experimental results and detailed analyses demonstrate the superiority and competitiveness of our model.\",\"PeriodicalId\":73500,\"journal\":{\"name\":\"Interspeech\",\"volume\":\"1 1\",\"pages\":\"1131-1135\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interspeech\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/interspeech.2022-11378\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-11378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

通过跨模态(文本到音频)迁移学习,端到端口语理解(E2E-SLU)取得了令人印象深刻的进步。然而,目前的方法大多侧重于具有简单损失的粗粒度序列级文本到音频的知识转移,而忽略了两种模式之间的细粒度时间对齐。在这项工作中,我们提出了一种新的用于E2E-SLU的多粒度跨模态迁移学习框架。具体来说,我们设计了一个交叉注意力模块来将文本的标记与语音的框架特征对齐,鼓励模型在传递语义信息的过程中针对每个标记所涉及的显著声学特征。我们还利用对比学习来促进句子层面的跨模态表征学习。最后,我们探索了各种数据扩充方法,以缓解E2E-SLU训练中大量标记数据的不足。在英文和中文SLU数据集上进行了大量实验,以验证我们提出的方法的有效性。实验结果和详细分析证明了该模型的优越性和竞争力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cross-modal Transfer Learning via Multi-grained Alignment for End-to-End Spoken Language Understanding
End-to-end spoken language understanding (E2E-SLU) has witnessed impressive improvements through cross-modal (text-to-audio) transfer learning. However, current methods mostly focus on coarse-grained sequence-level text-to-audio knowledge transfer with simple loss, and neglecting the fine-grained temporal alignment between the two modalities. In this work, we propose a novel multi-grained cross-modal transfer learning framework for E2E-SLU. Specifically, we devise a cross attention module to align the tokens of text with the frame features of speech, encouraging the model to target at the salient acoustic features attended to each token during transferring the semantic information. We also leverage contrastive learning to facilitate cross-modal representation learning in sentence level. Finally, we explore various data augmentation methods to mitigate the deficiency of large amount of labelled data for the training of E2E-SLU. Extensive experiments are conducted on both English and Chinese SLU datasets to verify the effectiveness of our proposed approach. Experimental results and detailed analyses demonstrate the superiority and competitiveness of our model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信