{"title":"汽车环境下低复杂度独立于说话人的命令词识别","authors":"S. Riis, O. Viikki","doi":"10.1109/ICASSP.2000.862089","DOIUrl":null,"url":null,"abstract":"In this paper we compare a standard HMM based recognizer to a highly parameter efficient hybrid denoted hidden neural network (HNN). The comparison was done on a speaker independent command word recognition task aimed at car hands-free applications. Monophone based HMM and HNN recognizers were initially trained on clean Wall Street Journal British English data. Evaluation of these baseline models on noisy car speech data indicated superior performance of the HMMs. After smoothing to the car environment, however, an HNN with 28k parameters provided a relative error rate reduction of 23-53% over HMMs containing 21k-168k parameters. Due to the low number of parameters in the HNNs, they have a real-time decoding complexity 2-4 times below that of comparable HMMs. The low memory and computational requirements of the HNN makes it particularly attractive for implementation on portable commercial hardware like mobile phones and personal digital assistants.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"314 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Low complexity speaker independent command word recognition in car environments\",\"authors\":\"S. Riis, O. Viikki\",\"doi\":\"10.1109/ICASSP.2000.862089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we compare a standard HMM based recognizer to a highly parameter efficient hybrid denoted hidden neural network (HNN). The comparison was done on a speaker independent command word recognition task aimed at car hands-free applications. Monophone based HMM and HNN recognizers were initially trained on clean Wall Street Journal British English data. Evaluation of these baseline models on noisy car speech data indicated superior performance of the HMMs. After smoothing to the car environment, however, an HNN with 28k parameters provided a relative error rate reduction of 23-53% over HMMs containing 21k-168k parameters. Due to the low number of parameters in the HNNs, they have a real-time decoding complexity 2-4 times below that of comparable HMMs. The low memory and computational requirements of the HNN makes it particularly attractive for implementation on portable commercial hardware like mobile phones and personal digital assistants.\",\"PeriodicalId\":164817,\"journal\":{\"name\":\"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)\",\"volume\":\"314 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2000.862089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2000.862089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Low complexity speaker independent command word recognition in car environments
In this paper we compare a standard HMM based recognizer to a highly parameter efficient hybrid denoted hidden neural network (HNN). The comparison was done on a speaker independent command word recognition task aimed at car hands-free applications. Monophone based HMM and HNN recognizers were initially trained on clean Wall Street Journal British English data. Evaluation of these baseline models on noisy car speech data indicated superior performance of the HMMs. After smoothing to the car environment, however, an HNN with 28k parameters provided a relative error rate reduction of 23-53% over HMMs containing 21k-168k parameters. Due to the low number of parameters in the HNNs, they have a real-time decoding complexity 2-4 times below that of comparable HMMs. The low memory and computational requirements of the HNN makes it particularly attractive for implementation on portable commercial hardware like mobile phones and personal digital assistants.