Deyuan Wang, Tiantian Zhang, Caixia Yuan, Xiaojie Wang
{"title":"ASR校正与对话状态跟踪联合建模","authors":"Deyuan Wang, Tiantian Zhang, Caixia Yuan, Xiaojie Wang","doi":"10.1109/ICASSP49357.2023.10095945","DOIUrl":null,"url":null,"abstract":"In spoken dialog system, transcription errors in Automated Speech Recognition (ASR) impact downstream task, especially dialog state tracking (DST). Approaches to alleviate such errors involve using richer information such as word-lattices and word confusion networks. However, in some cases, this information may not be easily obtained. In addition, the large pre-trained language model is trained on plain text, leading to the gap between spoken DST and original pretrained model. In this paper, we propose a multi-task method which performs DST jointly with ASR correction to improve the performance of both tasks. To do so, we build a MultiWOZ-ASR dataset containing ASR noise in DST and mitigate the gap by utilizing a multi-task pre-training framework. Moreover, curriculum learning is adopted to alleviate the phenomenon that the correction task is difficult to converge at the initial stage of pre-training. Experimental results show that our model achieves significant improvements on DSTC2 and MultiWOZ-ASR dataset.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Joint Modeling for ASR Correction and Dialog State Tracking\",\"authors\":\"Deyuan Wang, Tiantian Zhang, Caixia Yuan, Xiaojie Wang\",\"doi\":\"10.1109/ICASSP49357.2023.10095945\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In spoken dialog system, transcription errors in Automated Speech Recognition (ASR) impact downstream task, especially dialog state tracking (DST). Approaches to alleviate such errors involve using richer information such as word-lattices and word confusion networks. However, in some cases, this information may not be easily obtained. In addition, the large pre-trained language model is trained on plain text, leading to the gap between spoken DST and original pretrained model. In this paper, we propose a multi-task method which performs DST jointly with ASR correction to improve the performance of both tasks. To do so, we build a MultiWOZ-ASR dataset containing ASR noise in DST and mitigate the gap by utilizing a multi-task pre-training framework. Moreover, curriculum learning is adopted to alleviate the phenomenon that the correction task is difficult to converge at the initial stage of pre-training. Experimental results show that our model achieves significant improvements on DSTC2 and MultiWOZ-ASR dataset.\",\"PeriodicalId\":113072,\"journal\":{\"name\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP49357.2023.10095945\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP49357.2023.10095945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Joint Modeling for ASR Correction and Dialog State Tracking
In spoken dialog system, transcription errors in Automated Speech Recognition (ASR) impact downstream task, especially dialog state tracking (DST). Approaches to alleviate such errors involve using richer information such as word-lattices and word confusion networks. However, in some cases, this information may not be easily obtained. In addition, the large pre-trained language model is trained on plain text, leading to the gap between spoken DST and original pretrained model. In this paper, we propose a multi-task method which performs DST jointly with ASR correction to improve the performance of both tasks. To do so, we build a MultiWOZ-ASR dataset containing ASR noise in DST and mitigate the gap by utilizing a multi-task pre-training framework. Moreover, curriculum learning is adopted to alleviate the phenomenon that the correction task is difficult to converge at the initial stage of pre-training. Experimental results show that our model achieves significant improvements on DSTC2 and MultiWOZ-ASR dataset.