{"title":"Speech recognition with localized time-frequency pattern detectors","authors":"K. Schutte, James R. Glass","doi":"10.1109/ASRU.2007.4430135","DOIUrl":null,"url":null,"abstract":"A method for acoustic modeling of speech is presented which is based on learning and detecting the occurrence of localized time-frequency patterns in a spectrogram. A boosting algorithm is applied to both build classifiers and perform feature selection from a large set of features derived by filtering spectrograms. Initial experiments are performed to discriminate digits in the Aurora database. The system succeeds in learning sequences of localized time-frequency patterns which are highly interpretable from an acoustic-phonetic viewpoint. While the work and the results are preliminary, they suggest that pursuing these techniques further could lead to new approaches to acoustic modeling for ASR which are more noise robust and offer better encoding of temporal dynamics than typical features such as frame-based cepstra.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2007.4430135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
A method for acoustic modeling of speech is presented which is based on learning and detecting the occurrence of localized time-frequency patterns in a spectrogram. A boosting algorithm is applied to both build classifiers and perform feature selection from a large set of features derived by filtering spectrograms. Initial experiments are performed to discriminate digits in the Aurora database. The system succeeds in learning sequences of localized time-frequency patterns which are highly interpretable from an acoustic-phonetic viewpoint. While the work and the results are preliminary, they suggest that pursuing these techniques further could lead to new approaches to acoustic modeling for ASR which are more noise robust and offer better encoding of temporal dynamics than typical features such as frame-based cepstra.