Parallelized feature extraction and acoustic model training

Haofeng Kou, Weijia Shang
{"title":"Parallelized feature extraction and acoustic model training","authors":"Haofeng Kou, Weijia Shang","doi":"10.1109/ICDSP.2014.6900717","DOIUrl":null,"url":null,"abstract":"In this paper, we present our research on the parallelized speech recognition including both Mel-Frequency Cepstral Coefficient (MFCC) feature extraction [1] and Viterbi training for Hidden Markov Model (HMM) based acoustic model [2] on the Graphics Processing Units (GPU). Robust and accurate speech recognition systems can only be realized with adequately trained acoustic models derived from the effectively parsed features. For common languages, state-of-the-art systems are extracted and trained on many thousands of hours of speech data and even with large clusters of machines the entire extracting and training process can take weeks. To overcome this development bottleneck, we not only demonstrate that feature extraction and acoustic model training are suitable for GPUs, but also propose the optimized parallel implementation using highly parallel GPUs by combining the MFCC feature extraction along with Viterbi training for HMM acoustic model, illustrate its application concurrency characteristics, data working set sizes, and describe the optimizations required for effective throughput on GPU processors. We demonstrate that feature extraction and acoustic model training are well suited for GPUs. Using one GTX580 our approach is shown to be overall approximately 95x faster than a sequential CPU implementation at the same accuracy level, enabling feature extraction and acoustic model training to be performed at realtime.","PeriodicalId":301856,"journal":{"name":"2014 19th International Conference on Digital Signal Processing","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 19th International Conference on Digital Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSP.2014.6900717","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In this paper, we present our research on the parallelized speech recognition including both Mel-Frequency Cepstral Coefficient (MFCC) feature extraction [1] and Viterbi training for Hidden Markov Model (HMM) based acoustic model [2] on the Graphics Processing Units (GPU). Robust and accurate speech recognition systems can only be realized with adequately trained acoustic models derived from the effectively parsed features. For common languages, state-of-the-art systems are extracted and trained on many thousands of hours of speech data and even with large clusters of machines the entire extracting and training process can take weeks. To overcome this development bottleneck, we not only demonstrate that feature extraction and acoustic model training are suitable for GPUs, but also propose the optimized parallel implementation using highly parallel GPUs by combining the MFCC feature extraction along with Viterbi training for HMM acoustic model, illustrate its application concurrency characteristics, data working set sizes, and describe the optimizations required for effective throughput on GPU processors. We demonstrate that feature extraction and acoustic model training are well suited for GPUs. Using one GTX580 our approach is shown to be overall approximately 95x faster than a sequential CPU implementation at the same accuracy level, enabling feature extraction and acoustic model training to be performed at realtime.
并行特征提取和声学模型训练
在本文中,我们在图形处理单元(GPU)上对并行化语音识别进行了研究,包括Mel-Frequency倒谱系数(MFCC)特征提取[1]和基于隐马尔可夫模型(HMM)声学模型的Viterbi训练[2]。鲁棒和准确的语音识别系统只有通过有效解析的特征得到训练有素的声学模型才能实现。对于普通语言,最先进的系统是在数千小时的语音数据上提取和训练的,即使是大型机器集群,整个提取和训练过程也可能需要数周时间。为了克服这一发展瓶颈,我们不仅证明了特征提取和声学模型训练适用于GPU,而且提出了将MFCC特征提取与HMM声学模型的Viterbi训练相结合,利用高度并行的GPU优化并行实现,说明了其应用并发特性、数据工作集大小,并描述了GPU处理器有效吞吐量所需的优化。我们证明了特征提取和声学模型训练非常适合gpu。使用一台GTX580,我们的方法总体上比相同精度水平的顺序CPU实现快约95倍,使特征提取和声学模型训练能够实时执行。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信