{"title":"主题演讲1:声学信号预处理和声学建模的集成深度学习方法及其在鲁棒自动语音识别中的应用","authors":"Chin-Hui Lee","doi":"10.1109/APSIPA.2017.8281987","DOIUrl":null,"url":null,"abstract":"We cast the classical speech processing problem into a new nonlinear regression setting by mapping log power spectral features of noisy to clean speech based on deep neural networks (DNNs). DNN-enhanced speech obtained by the proposed approach demonstrates better speech quality and intelligibility than those obtained with conventional state-of-the-art algorithms. Furthermore, this new paradigm also facilitates an integrated deep learning framework to train the three key modules in an automatic speech recognition (ASR) system, namely signal conditioning, feature extraction and acoustic phone models, altogether in a unified manner. The proposed framework was tested on recent challenging ASR tasks in CHiME-2, CHiME-4 and REVERB, which are designed to evaluate ASR robustness in mixed speakers, multi-channel, and reverberant conditions. Leveraging upon this new approach, our team scored the lowest word error rates in all three tasks with acoustic pre-processing algorithms for speech separation, microphone array based speech enhancement and speech dereverberation.","PeriodicalId":91399,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), ... Asia-Pacific. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","volume":"146 1","pages":"v-viii"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Keynote speech 1: An integrated deep learning approach to acoustic signal pre-processing and acoustic modeling with applications to robust automatic speech recognition\",\"authors\":\"Chin-Hui Lee\",\"doi\":\"10.1109/APSIPA.2017.8281987\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We cast the classical speech processing problem into a new nonlinear regression setting by mapping log power spectral features of noisy to clean speech based on deep neural networks (DNNs). DNN-enhanced speech obtained by the proposed approach demonstrates better speech quality and intelligibility than those obtained with conventional state-of-the-art algorithms. Furthermore, this new paradigm also facilitates an integrated deep learning framework to train the three key modules in an automatic speech recognition (ASR) system, namely signal conditioning, feature extraction and acoustic phone models, altogether in a unified manner. The proposed framework was tested on recent challenging ASR tasks in CHiME-2, CHiME-4 and REVERB, which are designed to evaluate ASR robustness in mixed speakers, multi-channel, and reverberant conditions. Leveraging upon this new approach, our team scored the lowest word error rates in all three tasks with acoustic pre-processing algorithms for speech separation, microphone array based speech enhancement and speech dereverberation.\",\"PeriodicalId\":91399,\"journal\":{\"name\":\"Signal and Information Processing Association Annual Summit and Conference (APSIPA), ... Asia-Pacific. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference\",\"volume\":\"146 1\",\"pages\":\"v-viii\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Signal and Information Processing Association Annual Summit and Conference (APSIPA), ... Asia-Pacific. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIPA.2017.8281987\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), ... Asia-Pacific. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8281987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Keynote speech 1: An integrated deep learning approach to acoustic signal pre-processing and acoustic modeling with applications to robust automatic speech recognition
We cast the classical speech processing problem into a new nonlinear regression setting by mapping log power spectral features of noisy to clean speech based on deep neural networks (DNNs). DNN-enhanced speech obtained by the proposed approach demonstrates better speech quality and intelligibility than those obtained with conventional state-of-the-art algorithms. Furthermore, this new paradigm also facilitates an integrated deep learning framework to train the three key modules in an automatic speech recognition (ASR) system, namely signal conditioning, feature extraction and acoustic phone models, altogether in a unified manner. The proposed framework was tested on recent challenging ASR tasks in CHiME-2, CHiME-4 and REVERB, which are designed to evaluate ASR robustness in mixed speakers, multi-channel, and reverberant conditions. Leveraging upon this new approach, our team scored the lowest word error rates in all three tasks with acoustic pre-processing algorithms for speech separation, microphone array based speech enhancement and speech dereverberation.