{"title":"Investigating hybrid approaches for Arabic text diacritization with recurrent neural networks","authors":"Saba' Alqudah, Gheith A. Abandah, Alaa Arabiyat","doi":"10.1109/AEECT.2017.8257765","DOIUrl":null,"url":null,"abstract":"Deep neural networks are efficiently used today to solve many complex problems including the automatic diacritization of Arabic text. This paper investigates a hybrid approach for this problem based on a recurrent neural network (RNN). We use the MADAMIRA full morphological and syntactical analyzer to assist the RNN. Only the high confidence diacritics and word segmentation output of this analyzer is fed to the RNN that generates the fully diacritized output. On the LDC ATB3 benchmark, the suggested hybrid approach performs better than the statistical approach. It achieves diacritic and word error rates of 2.39 and 8.40%, respectively, which are 34 and 26% improvements, respectively, over the best previous hybrid results. We implemented the RNN using parallel software and hardware. We use the CURRENNT library to run the RNN on a GPU with 16 streaming multiprocessors. Compared with the previous RNN-based system, our solution is 326 times faster to train and takes an average 0.003 seconds to diacritize a word. This speed makes training on very large data sets feasible to build larger and more accurate deep neural networks.","PeriodicalId":286127,"journal":{"name":"2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AEECT.2017.8257765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
Deep neural networks are efficiently used today to solve many complex problems including the automatic diacritization of Arabic text. This paper investigates a hybrid approach for this problem based on a recurrent neural network (RNN). We use the MADAMIRA full morphological and syntactical analyzer to assist the RNN. Only the high confidence diacritics and word segmentation output of this analyzer is fed to the RNN that generates the fully diacritized output. On the LDC ATB3 benchmark, the suggested hybrid approach performs better than the statistical approach. It achieves diacritic and word error rates of 2.39 and 8.40%, respectively, which are 34 and 26% improvements, respectively, over the best previous hybrid results. We implemented the RNN using parallel software and hardware. We use the CURRENNT library to run the RNN on a GPU with 16 streaming multiprocessors. Compared with the previous RNN-based system, our solution is 326 times faster to train and takes an average 0.003 seconds to diacritize a word. This speed makes training on very large data sets feasible to build larger and more accurate deep neural networks.