ASR校正与对话状态跟踪联合建模

Deyuan Wang, Tiantian Zhang, Caixia Yuan, Xiaojie Wang
{"title":"ASR校正与对话状态跟踪联合建模","authors":"Deyuan Wang, Tiantian Zhang, Caixia Yuan, Xiaojie Wang","doi":"10.1109/ICASSP49357.2023.10095945","DOIUrl":null,"url":null,"abstract":"In spoken dialog system, transcription errors in Automated Speech Recognition (ASR) impact downstream task, especially dialog state tracking (DST). Approaches to alleviate such errors involve using richer information such as word-lattices and word confusion networks. However, in some cases, this information may not be easily obtained. In addition, the large pre-trained language model is trained on plain text, leading to the gap between spoken DST and original pretrained model. In this paper, we propose a multi-task method which performs DST jointly with ASR correction to improve the performance of both tasks. To do so, we build a MultiWOZ-ASR dataset containing ASR noise in DST and mitigate the gap by utilizing a multi-task pre-training framework. Moreover, curriculum learning is adopted to alleviate the phenomenon that the correction task is difficult to converge at the initial stage of pre-training. Experimental results show that our model achieves significant improvements on DSTC2 and MultiWOZ-ASR dataset.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Joint Modeling for ASR Correction and Dialog State Tracking\",\"authors\":\"Deyuan Wang, Tiantian Zhang, Caixia Yuan, Xiaojie Wang\",\"doi\":\"10.1109/ICASSP49357.2023.10095945\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In spoken dialog system, transcription errors in Automated Speech Recognition (ASR) impact downstream task, especially dialog state tracking (DST). Approaches to alleviate such errors involve using richer information such as word-lattices and word confusion networks. However, in some cases, this information may not be easily obtained. In addition, the large pre-trained language model is trained on plain text, leading to the gap between spoken DST and original pretrained model. In this paper, we propose a multi-task method which performs DST jointly with ASR correction to improve the performance of both tasks. To do so, we build a MultiWOZ-ASR dataset containing ASR noise in DST and mitigate the gap by utilizing a multi-task pre-training framework. Moreover, curriculum learning is adopted to alleviate the phenomenon that the correction task is difficult to converge at the initial stage of pre-training. Experimental results show that our model achieves significant improvements on DSTC2 and MultiWOZ-ASR dataset.\",\"PeriodicalId\":113072,\"journal\":{\"name\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP49357.2023.10095945\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP49357.2023.10095945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在口语对话系统中,自动语音识别(ASR)中的转录错误会影响下游任务,尤其是对话状态跟踪(DST)。减少这种错误的方法包括使用更丰富的信息,如词格和词混淆网络。然而,在某些情况下,这些信息可能不容易获得。此外,大型预训练语言模型是在纯文本上训练的,导致口语DST与原始预训练模型之间存在差距。在本文中,我们提出了一种多任务方法,将DST与ASR校正联合执行,以提高这两个任务的性能。为此,我们构建了一个包含DST中ASR噪声的MultiWOZ-ASR数据集,并利用多任务预训练框架来缓解差距。此外,采用课程学习的方法,缓解了在预训练初始阶段纠正任务难以收敛的现象。实验结果表明,我们的模型在DSTC2和MultiWOZ-ASR数据集上取得了显著的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Joint Modeling for ASR Correction and Dialog State Tracking
In spoken dialog system, transcription errors in Automated Speech Recognition (ASR) impact downstream task, especially dialog state tracking (DST). Approaches to alleviate such errors involve using richer information such as word-lattices and word confusion networks. However, in some cases, this information may not be easily obtained. In addition, the large pre-trained language model is trained on plain text, leading to the gap between spoken DST and original pretrained model. In this paper, we propose a multi-task method which performs DST jointly with ASR correction to improve the performance of both tasks. To do so, we build a MultiWOZ-ASR dataset containing ASR noise in DST and mitigate the gap by utilizing a multi-task pre-training framework. Moreover, curriculum learning is adopted to alleviate the phenomenon that the correction task is difficult to converge at the initial stage of pre-training. Experimental results show that our model achieves significant improvements on DSTC2 and MultiWOZ-ASR dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信