Investigating hybrid approaches for Arabic text diacritization with recurrent neural networks

Saba' Alqudah, Gheith A. Abandah, Alaa Arabiyat
{"title":"Investigating hybrid approaches for Arabic text diacritization with recurrent neural networks","authors":"Saba' Alqudah, Gheith A. Abandah, Alaa Arabiyat","doi":"10.1109/AEECT.2017.8257765","DOIUrl":null,"url":null,"abstract":"Deep neural networks are efficiently used today to solve many complex problems including the automatic diacritization of Arabic text. This paper investigates a hybrid approach for this problem based on a recurrent neural network (RNN). We use the MADAMIRA full morphological and syntactical analyzer to assist the RNN. Only the high confidence diacritics and word segmentation output of this analyzer is fed to the RNN that generates the fully diacritized output. On the LDC ATB3 benchmark, the suggested hybrid approach performs better than the statistical approach. It achieves diacritic and word error rates of 2.39 and 8.40%, respectively, which are 34 and 26% improvements, respectively, over the best previous hybrid results. We implemented the RNN using parallel software and hardware. We use the CURRENNT library to run the RNN on a GPU with 16 streaming multiprocessors. Compared with the previous RNN-based system, our solution is 326 times faster to train and takes an average 0.003 seconds to diacritize a word. This speed makes training on very large data sets feasible to build larger and more accurate deep neural networks.","PeriodicalId":286127,"journal":{"name":"2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AEECT.2017.8257765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Deep neural networks are efficiently used today to solve many complex problems including the automatic diacritization of Arabic text. This paper investigates a hybrid approach for this problem based on a recurrent neural network (RNN). We use the MADAMIRA full morphological and syntactical analyzer to assist the RNN. Only the high confidence diacritics and word segmentation output of this analyzer is fed to the RNN that generates the fully diacritized output. On the LDC ATB3 benchmark, the suggested hybrid approach performs better than the statistical approach. It achieves diacritic and word error rates of 2.39 and 8.40%, respectively, which are 34 and 26% improvements, respectively, over the best previous hybrid results. We implemented the RNN using parallel software and hardware. We use the CURRENNT library to run the RNN on a GPU with 16 streaming multiprocessors. Compared with the previous RNN-based system, our solution is 326 times faster to train and takes an average 0.003 seconds to diacritize a word. This speed makes training on very large data sets feasible to build larger and more accurate deep neural networks.
用递归神经网络研究阿拉伯文本变音符化的混合方法
目前,深度神经网络被有效地用于解决许多复杂的问题,包括阿拉伯语文本的自动变音符。本文研究了一种基于递归神经网络(RNN)的混合方法。我们使用MADAMIRA完整的形态学和句法分析器来辅助RNN。只有该分析器的高置信度变音符和分词输出被馈送到生成完全变音符输出的RNN。在最不发达国家ATB3基准测试中,建议的混合方法比统计方法性能更好。它的变音符错误率和单词错误率分别为2.39%和8.40%,比之前最好的混合结果分别提高了34%和26%。我们使用并行的软件和硬件来实现RNN。我们使用current库在具有16个流多处理器的GPU上运行RNN。与之前基于rnn的系统相比,我们的解决方案的训练速度快了326倍,平均0.003秒来改变一个单词的音调。这种速度使得在非常大的数据集上进行训练成为可能,从而构建更大、更精确的深度神经网络。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信