{"title":"在FPGA上使用连续隐马尔可夫模型对多个并行文件进行语音识别","authors":"S. Melnikoff, S. Quigley, M. Russell","doi":"10.1109/FPT.2002.1188720","DOIUrl":null,"url":null,"abstract":"Speech recognition is a computationally demanding task, particularly the stages which use Viterbi decoding for converting pre-processed speech data into words or subword unit, and the associated observation probability calculations, which employ multivariate Gaussian distributions; so any device that can reduce the load on, for example, a PC's processor, is advantageous. Hence we present two implementations of a speech recognition system incorporating an FPGA, employing continuous hidden Markov models (HMMs), and capable of processing three speech files simultaneously. The first uses monophones, and can perform recognition 250 times real time (in terms of average time per observation), as well as outperforming its software equivalent. The second uses biphones and triphones, reducing the speedup to 13 times real time.","PeriodicalId":355740,"journal":{"name":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Performing speech recognition on multiple parallel files using continuous hidden Markov models on an FPGA\",\"authors\":\"S. Melnikoff, S. Quigley, M. Russell\",\"doi\":\"10.1109/FPT.2002.1188720\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech recognition is a computationally demanding task, particularly the stages which use Viterbi decoding for converting pre-processed speech data into words or subword unit, and the associated observation probability calculations, which employ multivariate Gaussian distributions; so any device that can reduce the load on, for example, a PC's processor, is advantageous. Hence we present two implementations of a speech recognition system incorporating an FPGA, employing continuous hidden Markov models (HMMs), and capable of processing three speech files simultaneously. The first uses monophones, and can perform recognition 250 times real time (in terms of average time per observation), as well as outperforming its software equivalent. The second uses biphones and triphones, reducing the speedup to 13 times real time.\",\"PeriodicalId\":355740,\"journal\":{\"name\":\"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPT.2002.1188720\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2002.1188720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performing speech recognition on multiple parallel files using continuous hidden Markov models on an FPGA
Speech recognition is a computationally demanding task, particularly the stages which use Viterbi decoding for converting pre-processed speech data into words or subword unit, and the associated observation probability calculations, which employ multivariate Gaussian distributions; so any device that can reduce the load on, for example, a PC's processor, is advantageous. Hence we present two implementations of a speech recognition system incorporating an FPGA, employing continuous hidden Markov models (HMMs), and capable of processing three speech files simultaneously. The first uses monophones, and can perform recognition 250 times real time (in terms of average time per observation), as well as outperforming its software equivalent. The second uses biphones and triphones, reducing the speedup to 13 times real time.