F. de-la-Calle-Silos, F. J. Valverde-Albacete, A. Gallardo-Antolín, Carmen Peláez-Moreno
{"title":"Preliminary experiments on the robustness of biologically motivated features for DNN-based ASR","authors":"F. de-la-Calle-Silos, F. J. Valverde-Albacete, A. Gallardo-Antolín, Carmen Peláez-Moreno","doi":"10.1109/IWOBI.2015.7160162","DOIUrl":null,"url":null,"abstract":"A perceptually motivated feature extraction method based on mimicking the masking properties of the cochlea has been recently found to provide enhanced performance when applied to conventional speech recognition back-ends. On the other hand, the introduction of Deep Neural Network (DNN) based acoustic models has produced dramatic improvements in performance. In particular, we found that Deep Maxout Networks, a modification of DNNs' feed-forward architecture that uses a max-out activation function, provides enhanced robustness to environmental noise. In this paper, we present preliminary experiments on the combination of these two elements that already show how the DMN-based back-end is capable of taking advantage of these auditorily inspired features making the whole system more robust and also suggesting that human-like representations of speech keep playing an important role in DNN-based automatic speech recognition systems.","PeriodicalId":373170,"journal":{"name":"2015 4th International Work Conference on Bioinspired Intelligence (IWOBI)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 4th International Work Conference on Bioinspired Intelligence (IWOBI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWOBI.2015.7160162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A perceptually motivated feature extraction method based on mimicking the masking properties of the cochlea has been recently found to provide enhanced performance when applied to conventional speech recognition back-ends. On the other hand, the introduction of Deep Neural Network (DNN) based acoustic models has produced dramatic improvements in performance. In particular, we found that Deep Maxout Networks, a modification of DNNs' feed-forward architecture that uses a max-out activation function, provides enhanced robustness to environmental noise. In this paper, we present preliminary experiments on the combination of these two elements that already show how the DMN-based back-end is capable of taking advantage of these auditorily inspired features making the whole system more robust and also suggesting that human-like representations of speech keep playing an important role in DNN-based automatic speech recognition systems.