Zhengyu Zhou, I. G. Choi, Yongliang He, Vikas Yadav, Chin-Hui Lee
{"title":"Using Paralinguistic Information to Disambiguate User Intentions for Distinguishing Phrase Structure and Sarcasm in Spoken Dialog Systems","authors":"Zhengyu Zhou, I. G. Choi, Yongliang He, Vikas Yadav, Chin-Hui Lee","doi":"10.1109/SLT48900.2021.9383505","DOIUrl":null,"url":null,"abstract":"This paper aims at utilizing paralinguistic information usually hidden in speech signals, such as pitch, short pause and sarcasm, to disambiguate user intention not easily distinguishable from speech recognition and natural language understanding results provided by a state-of-the-art spoken dialog system (SDS). We propose two methods to address the ambiguities in understanding name entities and sentence structures based on relevant speech cues and nuances. We also propose an approach to capturing sarcasm in speech and generating sarcasm-sensitive responses using an end-to-end neural network. An SDS prototype that directly feeds signal information into the understanding and response generation components has also been developed to support the three proposed applications. We have achieved encouraging experimental results in this initial study, demonstrating the potential of this new research direction.","PeriodicalId":243211,"journal":{"name":"2021 IEEE Spoken Language Technology Workshop (SLT)","volume":"20 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT48900.2021.9383505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper aims at utilizing paralinguistic information usually hidden in speech signals, such as pitch, short pause and sarcasm, to disambiguate user intention not easily distinguishable from speech recognition and natural language understanding results provided by a state-of-the-art spoken dialog system (SDS). We propose two methods to address the ambiguities in understanding name entities and sentence structures based on relevant speech cues and nuances. We also propose an approach to capturing sarcasm in speech and generating sarcasm-sensitive responses using an end-to-end neural network. An SDS prototype that directly feeds signal information into the understanding and response generation components has also been developed to support the three proposed applications. We have achieved encouraging experimental results in this initial study, demonstrating the potential of this new research direction.