{"title":"A Generative Auditory Model Embedded Neural Network for Speech Processing","authors":"Yu-Wen Lo, Yih-Liang Shen, Y. Liao, T. Chi","doi":"10.1109/ICASSP.2018.8462690","DOIUrl":null,"url":null,"abstract":"Before the era of the neural network (NN), features extracted from auditory models have been applied to various speech applications and been demonstrated more robust against noise than conventional speech-processing features. What's the role of auditory models in the current NN era? Are they obsolete? To answer this question, we construct a NN with a generative auditory model embedded to process speech signals. The generative auditory model consists of two stages, the stage of spectrum estimation in the logarithmic-frequency axis by the cochlea and the stage of spectral-temporal analysis in the modulation domain by the auditory cortex. The NN is evaluated in a simple speaker identification task. Experiment results show that the auditory model embedded NN is still more robust against noise, especially in low SNR conditions, than the randomly-initialized NN in speaker identification.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 1","pages":"5179-5183"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2018.8462690","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Before the era of the neural network (NN), features extracted from auditory models have been applied to various speech applications and been demonstrated more robust against noise than conventional speech-processing features. What's the role of auditory models in the current NN era? Are they obsolete? To answer this question, we construct a NN with a generative auditory model embedded to process speech signals. The generative auditory model consists of two stages, the stage of spectrum estimation in the logarithmic-frequency axis by the cochlea and the stage of spectral-temporal analysis in the modulation domain by the auditory cortex. The NN is evaluated in a simple speaker identification task. Experiment results show that the auditory model embedded NN is still more robust against noise, especially in low SNR conditions, than the randomly-initialized NN in speaker identification.