用递归神经网络构建Waray-waray神经语言模型

IF 0.4 Q4 MULTIDISCIPLINARY SCIENCES
Fernando E. Quiroz, Jr., Chona B. Sabinay, Jeneffer A. Sabonsolin
{"title":"用递归神经网络构建Waray-waray神经语言模型","authors":"Fernando E. Quiroz, Jr., Chona B. Sabinay, Jeneffer A. Sabonsolin","doi":"10.61310/mndjsteect.1170.23","DOIUrl":null,"url":null,"abstract":"In the Philippines, language modeling is challenging since most of its languages are low-resourced. Tagalog and Cebuano are the only languages present in machine translation platforms like Google Translate; Winaray, a language spoken in the Eastern Visayas region, is inexistent. Hence, this study developed a Winaray language model that could be used in any natural language processing-related tasks. The text corpus used in creating the model was scrapped from the web (religious and local news websites, and Wikipedia) containing Winaray sentences. The model was trained using an encoder-decoder recurrent neural network with four sequential layers and 100 hidden neurons. The text prediction accuracy of the model reached 76.17%. The model was manually evaluated based on its text-generated sentences using linguistic quality dimensions such as grammaticality, non-redundancy, focus, structure and coherence. Results of manual evaluation showed a promising result as the linguistic quality reached 3.66 (acceptable); however, training data must be improved in terms of size with the addition of texts in various text genres.","PeriodicalId":40697,"journal":{"name":"Mindanao Journal of Science and Technology","volume":" ","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building the Waray-waray Neural Language Model using Recurrent Neural Network\",\"authors\":\"Fernando E. Quiroz, Jr., Chona B. Sabinay, Jeneffer A. Sabonsolin\",\"doi\":\"10.61310/mndjsteect.1170.23\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the Philippines, language modeling is challenging since most of its languages are low-resourced. Tagalog and Cebuano are the only languages present in machine translation platforms like Google Translate; Winaray, a language spoken in the Eastern Visayas region, is inexistent. Hence, this study developed a Winaray language model that could be used in any natural language processing-related tasks. The text corpus used in creating the model was scrapped from the web (religious and local news websites, and Wikipedia) containing Winaray sentences. The model was trained using an encoder-decoder recurrent neural network with four sequential layers and 100 hidden neurons. The text prediction accuracy of the model reached 76.17%. The model was manually evaluated based on its text-generated sentences using linguistic quality dimensions such as grammaticality, non-redundancy, focus, structure and coherence. Results of manual evaluation showed a promising result as the linguistic quality reached 3.66 (acceptable); however, training data must be improved in terms of size with the addition of texts in various text genres.\",\"PeriodicalId\":40697,\"journal\":{\"name\":\"Mindanao Journal of Science and Technology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2023-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mindanao Journal of Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.61310/mndjsteect.1170.23\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mindanao Journal of Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.61310/mndjsteect.1170.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

在菲律宾,语言建模具有挑战性,因为大多数语言资源不足。他加禄语和塞布阿诺语是谷歌翻译等机器翻译平台中唯一存在的语言;维那莱语是东米沙亚斯地区的一种语言,是不存在的。因此,本研究开发了一个Winaray语言模型,可用于任何与自然语言处理相关的任务。创建该模型时使用的文本语料库已从包含Winaray语句的网络(宗教和地方新闻网站以及维基百科)中删除。该模型使用具有四个顺序层和100个隐藏神经元的编码器-解码器递归神经网络进行训练。该模型的文本预测准确率达到76.17%。该模型基于文本生成的句子,使用语法性、非冗余性、焦点、结构和连贯性等语言质量维度进行手动评估。人工评估结果显示,语言质量达到3.66(可接受),效果良好;然而,随着各种文本类型的文本的添加,训练数据必须在大小方面得到改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Building the Waray-waray Neural Language Model using Recurrent Neural Network
In the Philippines, language modeling is challenging since most of its languages are low-resourced. Tagalog and Cebuano are the only languages present in machine translation platforms like Google Translate; Winaray, a language spoken in the Eastern Visayas region, is inexistent. Hence, this study developed a Winaray language model that could be used in any natural language processing-related tasks. The text corpus used in creating the model was scrapped from the web (religious and local news websites, and Wikipedia) containing Winaray sentences. The model was trained using an encoder-decoder recurrent neural network with four sequential layers and 100 hidden neurons. The text prediction accuracy of the model reached 76.17%. The model was manually evaluated based on its text-generated sentences using linguistic quality dimensions such as grammaticality, non-redundancy, focus, structure and coherence. Results of manual evaluation showed a promising result as the linguistic quality reached 3.66 (acceptable); however, training data must be improved in terms of size with the addition of texts in various text genres.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Mindanao Journal of Science and Technology
Mindanao Journal of Science and Technology MULTIDISCIPLINARY SCIENCES-
CiteScore
0.90
自引率
0.00%
发文量
18
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信