{"title":"GPU加速自动语音识别的移动设备","authors":"R. Veitch, R. Woods, Louis-Marie Aubert","doi":"10.1109/INDIN.2011.6034999","DOIUrl":null,"url":null,"abstract":"The implementation of a complex, large vocabulary, speech recognition application on a modern graphic processors (GPUs) is presented. The parallel single instruction, multiple data (SIMD) architecture is effectively exploited by performing various optimizations to expose the algorithmic parallelism. The work addresses particularly the realization of the Gaussian calculation, a key function. The result is an implementation that runs 3.75 faster than real-time and gives a tenfold speedup when compared to a highly optimized sequential CPU-based implementation. The work is also compared with some earlier work involved in building the same system on a Virtex 5-based, Alpha Data XRC-5T1 reconfigurable computer.","PeriodicalId":378407,"journal":{"name":"2011 9th IEEE International Conference on Industrial Informatics","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"GPU acceleration of automated speech recognition for mobile devices\",\"authors\":\"R. Veitch, R. Woods, Louis-Marie Aubert\",\"doi\":\"10.1109/INDIN.2011.6034999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The implementation of a complex, large vocabulary, speech recognition application on a modern graphic processors (GPUs) is presented. The parallel single instruction, multiple data (SIMD) architecture is effectively exploited by performing various optimizations to expose the algorithmic parallelism. The work addresses particularly the realization of the Gaussian calculation, a key function. The result is an implementation that runs 3.75 faster than real-time and gives a tenfold speedup when compared to a highly optimized sequential CPU-based implementation. The work is also compared with some earlier work involved in building the same system on a Virtex 5-based, Alpha Data XRC-5T1 reconfigurable computer.\",\"PeriodicalId\":378407,\"journal\":{\"name\":\"2011 9th IEEE International Conference on Industrial Informatics\",\"volume\":\"134 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 9th IEEE International Conference on Industrial Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDIN.2011.6034999\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 9th IEEE International Conference on Industrial Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN.2011.6034999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
介绍了在现代图形处理器(gpu)上实现一个复杂的、大词汇量的语音识别应用。并行的单指令多数据(SIMD)架构通过执行各种优化来有效地利用算法的并行性。该工作特别讨论了高斯计算的实现,这是一个关键功能。结果是实现的运行速度比实时快3.75,与高度优化的基于顺序cpu的实现相比,速度提高了10倍。这项工作还与早期在基于Virtex 5的Alpha Data XRC-5T1可重构计算机上构建相同系统的一些工作进行了比较。
GPU acceleration of automated speech recognition for mobile devices
The implementation of a complex, large vocabulary, speech recognition application on a modern graphic processors (GPUs) is presented. The parallel single instruction, multiple data (SIMD) architecture is effectively exploited by performing various optimizations to expose the algorithmic parallelism. The work addresses particularly the realization of the Gaussian calculation, a key function. The result is an implementation that runs 3.75 faster than real-time and gives a tenfold speedup when compared to a highly optimized sequential CPU-based implementation. The work is also compared with some earlier work involved in building the same system on a Virtex 5-based, Alpha Data XRC-5T1 reconfigurable computer.