Yao Qian, Rutuja Ubale, Vikram Ramanarayanan, P. Lange, David Suendermann-Oeft, Keelan Evanini, Eugene Tsuprun
{"title":"探索无asr的端到端建模,以改进基于云的对话系统中的口语理解","authors":"Yao Qian, Rutuja Ubale, Vikram Ramanarayanan, P. Lange, David Suendermann-Oeft, Keelan Evanini, Eugene Tsuprun","doi":"10.1109/ASRU.2017.8268987","DOIUrl":null,"url":null,"abstract":"Spoken language understanding (SLU) in dialog systems is generally performed using a natural language understanding (NLU) model based on the hypotheses produced by an automatic speech recognition (ASR) system. However, when new spoken dialog applications are built from scratch in real user environments that often have sub-optimal audio characteristics, ASR performance can suffer due to factors such as the paucity of training data or a mismatch between the training and test data. To address this issue, this paper proposes an ASR-free, end-to-end (E2E) modeling approach to SLU for a cloud-based, modular spoken dialog system (SDS). We evaluate the effectiveness of our approach on crowdsourced data collected from non-native English speakers interacting with a conversational language learning application. Experimental results show that our approach is particularly promising in situations with low ASR accuracy. It can further improve the performance of a sophisticated CNN-based SLU system with more accurate ASR hypotheses by fusing the scores from E2E system, i.e., the overall accuracy of SLU is improved from 85.6% to 86.5%.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"69","resultStr":"{\"title\":\"Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system\",\"authors\":\"Yao Qian, Rutuja Ubale, Vikram Ramanarayanan, P. Lange, David Suendermann-Oeft, Keelan Evanini, Eugene Tsuprun\",\"doi\":\"10.1109/ASRU.2017.8268987\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spoken language understanding (SLU) in dialog systems is generally performed using a natural language understanding (NLU) model based on the hypotheses produced by an automatic speech recognition (ASR) system. However, when new spoken dialog applications are built from scratch in real user environments that often have sub-optimal audio characteristics, ASR performance can suffer due to factors such as the paucity of training data or a mismatch between the training and test data. To address this issue, this paper proposes an ASR-free, end-to-end (E2E) modeling approach to SLU for a cloud-based, modular spoken dialog system (SDS). We evaluate the effectiveness of our approach on crowdsourced data collected from non-native English speakers interacting with a conversational language learning application. Experimental results show that our approach is particularly promising in situations with low ASR accuracy. It can further improve the performance of a sophisticated CNN-based SLU system with more accurate ASR hypotheses by fusing the scores from E2E system, i.e., the overall accuracy of SLU is improved from 85.6% to 86.5%.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"69\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268987\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system
Spoken language understanding (SLU) in dialog systems is generally performed using a natural language understanding (NLU) model based on the hypotheses produced by an automatic speech recognition (ASR) system. However, when new spoken dialog applications are built from scratch in real user environments that often have sub-optimal audio characteristics, ASR performance can suffer due to factors such as the paucity of training data or a mismatch between the training and test data. To address this issue, this paper proposes an ASR-free, end-to-end (E2E) modeling approach to SLU for a cloud-based, modular spoken dialog system (SDS). We evaluate the effectiveness of our approach on crowdsourced data collected from non-native English speakers interacting with a conversational language learning application. Experimental results show that our approach is particularly promising in situations with low ASR accuracy. It can further improve the performance of a sophisticated CNN-based SLU system with more accurate ASR hypotheses by fusing the scores from E2E system, i.e., the overall accuracy of SLU is improved from 85.6% to 86.5%.