{"title":"多层感知器连续语音识别的时间分辨率实验","authors":"N. Morgan, Chuck Wooters, H. Hermansky","doi":"10.1109/NNSP.1991.239501","DOIUrl":null,"url":null,"abstract":"Previous work by the authors focused on the integration of multilayer perceptrons (MLP) into hidden Markov models (HMM) and on the use of perceptual linear prediction (PLP) parameters for the feature inputs to such nets. The system uses the Viterbi algorithm for temporal alignment. This algorithm is a simple and optimal procedure, but it necessitates a frame-based analysis in which all features have the same implicit time constants. The authors provide a range of temporal/spectral resolution choices to a frame-based system by using a layered network to incorporate this information for phonetic discrimination. They have performed experiments in which they expanded their PLP analysis to include short analysis windows, and in which they trained phonetic classification networks to incorporate this added information. They hypothesized that classification scores would improve, especially for short-duration phonemes. These experiments did not yield the expected improvement.<<ETX>>","PeriodicalId":354832,"journal":{"name":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","volume":"256 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Experiments with temporal resolution for continuous speech recognition with multi-layer perceptrons\",\"authors\":\"N. Morgan, Chuck Wooters, H. Hermansky\",\"doi\":\"10.1109/NNSP.1991.239501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous work by the authors focused on the integration of multilayer perceptrons (MLP) into hidden Markov models (HMM) and on the use of perceptual linear prediction (PLP) parameters for the feature inputs to such nets. The system uses the Viterbi algorithm for temporal alignment. This algorithm is a simple and optimal procedure, but it necessitates a frame-based analysis in which all features have the same implicit time constants. The authors provide a range of temporal/spectral resolution choices to a frame-based system by using a layered network to incorporate this information for phonetic discrimination. They have performed experiments in which they expanded their PLP analysis to include short analysis windows, and in which they trained phonetic classification networks to incorporate this added information. They hypothesized that classification scores would improve, especially for short-duration phonemes. These experiments did not yield the expected improvement.<<ETX>>\",\"PeriodicalId\":354832,\"journal\":{\"name\":\"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop\",\"volume\":\"256 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NNSP.1991.239501\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NNSP.1991.239501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Experiments with temporal resolution for continuous speech recognition with multi-layer perceptrons
Previous work by the authors focused on the integration of multilayer perceptrons (MLP) into hidden Markov models (HMM) and on the use of perceptual linear prediction (PLP) parameters for the feature inputs to such nets. The system uses the Viterbi algorithm for temporal alignment. This algorithm is a simple and optimal procedure, but it necessitates a frame-based analysis in which all features have the same implicit time constants. The authors provide a range of temporal/spectral resolution choices to a frame-based system by using a layered network to incorporate this information for phonetic discrimination. They have performed experiments in which they expanded their PLP analysis to include short analysis windows, and in which they trained phonetic classification networks to incorporate this added information. They hypothesized that classification scores would improve, especially for short-duration phonemes. These experiments did not yield the expected improvement.<>