Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge

T. A. Hanai, Wei-Ning Hsu, James R. Glass
{"title":"Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge","authors":"T. A. Hanai, Wei-Ning Hsu, James R. Glass","doi":"10.1109/SLT.2016.7846280","DOIUrl":null,"url":null,"abstract":"The Arabic language, with over 300 million speakers, has significant diversity and breadth. This proves challenging when building an automated system to understand what is said. This paper describes an Arabic Automatic Speech Recognition system developed on a 1,200 hour speech corpus that was made available for the 2016 Arabic Multi-genre Broadcast (MGB) Challenge. A range of Deep Neural Network (DNN) topologies were modeled including; Feed-forward, Convolutional, Time-Delay, Recurrent Long Short-Term Memory (LSTM), Highway LSTM (H-LSTM), and Grid LSTM (GLSTM). The best performance came from a sequence discriminatively trained G-LSTM neural network. The best overall Word Error Rate (WER) was 18.3% (p < 0:001) on the development set, after combining hypotheses of 3 and 5 layer sequence discriminatively trained G-LSTM models that had been rescored with a 4-gram language model.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

The Arabic language, with over 300 million speakers, has significant diversity and breadth. This proves challenging when building an automated system to understand what is said. This paper describes an Arabic Automatic Speech Recognition system developed on a 1,200 hour speech corpus that was made available for the 2016 Arabic Multi-genre Broadcast (MGB) Challenge. A range of Deep Neural Network (DNN) topologies were modeled including; Feed-forward, Convolutional, Time-Delay, Recurrent Long Short-Term Memory (LSTM), Highway LSTM (H-LSTM), and Grid LSTM (GLSTM). The best performance came from a sequence discriminatively trained G-LSTM neural network. The best overall Word Error Rate (WER) was 18.3% (p < 0:001) on the development set, after combining hypotheses of 3 and 5 layer sequence discriminatively trained G-LSTM models that had been rescored with a 4-gram language model.
为2016年阿拉伯多类型广播挑战赛开发麻省理工学院ASR系统
阿拉伯语有3亿多使用者,具有显著的多样性和广泛性。当构建一个自动化系统来理解所说的内容时,这被证明是具有挑战性的。本文描述了一个基于1200小时语音语料库开发的阿拉伯语自动语音识别系统,该语料库可用于2016年阿拉伯语多类型广播(MGB)挑战赛。一系列深度神经网络(DNN)拓扑被建模,包括;前馈、卷积、时滞、循环长短期记忆(LSTM)、高速LSTM (H-LSTM)、网格LSTM (GLSTM)。序列判别训练的G-LSTM神经网络性能最好。将3层和5层序列判别训练的G-LSTM模型与4克语言模型相结合,在开发集上,最佳的总体单词错误率(WER)为18.3% (p < 0:001)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信