Yaodong Tang, Yuchen Huang, Zhiyong Wu, H. Meng, Mingxing Xu, Lianhong Cai
{"title":"基于门控递归单元的递归神经网络声学特征问题检测","authors":"Yaodong Tang, Yuchen Huang, Zhiyong Wu, H. Meng, Mingxing Xu, Lianhong Cai","doi":"10.1109/ICASSP.2016.7472854","DOIUrl":null,"url":null,"abstract":"Question detection is of importance for many speech applications. Only parts of the speech utterances can provide useful clues for question detection. Previous work of question detection using acoustic features in Mandarin conversation is weak in capturing such proper time context information, which could be modeled essentially in recurrent neural network (RNN) structure. In this paper, we conduct an investigation on recurrent approaches to cope with this problem. Based on gated recurrent unit (GRU), we build different RNN and bidirectional RNN (BRNN) models to extract efficient features at segment and utterance level. The particular advantage of GRU is it can determine a proper time scale to extract high-level contextual features. Experimental results show that the features extracted within proper time scale make the classifier perform better than the baseline method with pre-designed lexical and acoustic feature set.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":"{\"title\":\"Question detection from acoustic features using recurrent neural network with gated recurrent unit\",\"authors\":\"Yaodong Tang, Yuchen Huang, Zhiyong Wu, H. Meng, Mingxing Xu, Lianhong Cai\",\"doi\":\"10.1109/ICASSP.2016.7472854\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Question detection is of importance for many speech applications. Only parts of the speech utterances can provide useful clues for question detection. Previous work of question detection using acoustic features in Mandarin conversation is weak in capturing such proper time context information, which could be modeled essentially in recurrent neural network (RNN) structure. In this paper, we conduct an investigation on recurrent approaches to cope with this problem. Based on gated recurrent unit (GRU), we build different RNN and bidirectional RNN (BRNN) models to extract efficient features at segment and utterance level. The particular advantage of GRU is it can determine a proper time scale to extract high-level contextual features. Experimental results show that the features extracted within proper time scale make the classifier perform better than the baseline method with pre-designed lexical and acoustic feature set.\",\"PeriodicalId\":165321,\"journal\":{\"name\":\"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"55\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2016.7472854\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2016.7472854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Question detection from acoustic features using recurrent neural network with gated recurrent unit
Question detection is of importance for many speech applications. Only parts of the speech utterances can provide useful clues for question detection. Previous work of question detection using acoustic features in Mandarin conversation is weak in capturing such proper time context information, which could be modeled essentially in recurrent neural network (RNN) structure. In this paper, we conduct an investigation on recurrent approaches to cope with this problem. Based on gated recurrent unit (GRU), we build different RNN and bidirectional RNN (BRNN) models to extract efficient features at segment and utterance level. The particular advantage of GRU is it can determine a proper time scale to extract high-level contextual features. Experimental results show that the features extracted within proper time scale make the classifier perform better than the baseline method with pre-designed lexical and acoustic feature set.