孟加拉语单表达分类及其在孟加拉语-英语机器翻译中的作用

Apurbalal Senapati, Utpal Garain
{"title":"孟加拉语单表达分类及其在孟加拉语-英语机器翻译中的作用","authors":"Apurbalal Senapati, Utpal Garain","doi":"10.1109/IALP.2014.6973489","DOIUrl":null,"url":null,"abstract":"This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine translation. The characteristics of one-expressions are studied in 177 million word corpus. A classification scheme has been proposed for the grouping the one-expressions. The features contributing towards the classification are identified and a CRF-based classifier is trained on an authors' generated annotated dataset containing 2006 instances of one-expressions. The classifier's performance is tested on a test set (containing 300 instances of Bengali one-expressions) which is different from the training data. Evaluation shows that the classifier can correctly classify the one-expressions in 75% cases. Finally, the utility of this classification task is investigated for machine translation (Bengali-English). The translation accuracy is improved from 39% (by Google translator) to 60% (by the proposed approach) and this improvement is found to be statistically significant. All the annotated datasets (there was none before) are made free to facilitate further research on this topic.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"One-expression classification in Bengali and its role in Bengali-English machine translation\",\"authors\":\"Apurbalal Senapati, Utpal Garain\",\"doi\":\"10.1109/IALP.2014.6973489\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine translation. The characteristics of one-expressions are studied in 177 million word corpus. A classification scheme has been proposed for the grouping the one-expressions. The features contributing towards the classification are identified and a CRF-based classifier is trained on an authors' generated annotated dataset containing 2006 instances of one-expressions. The classifier's performance is tested on a test set (containing 300 instances of Bengali one-expressions) which is different from the training data. Evaluation shows that the classifier can correctly classify the one-expressions in 75% cases. Finally, the utility of this classification task is investigated for machine translation (Bengali-English). The translation accuracy is improved from 39% (by Google translator) to 60% (by the proposed approach) and this improvement is found to be statistically significant. All the annotated datasets (there was none before) are made free to facilitate further research on this topic.\",\"PeriodicalId\":117334,\"journal\":{\"name\":\"2014 International Conference on Asian Language Processing (IALP)\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Asian Language Processing (IALP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2014.6973489\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2014.6973489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文试图对孟加拉语中的单表达式进行分析,并证明其在机器翻译中的有效性。在1.77亿词的语料库中研究了单表达式的特征。提出了一种对单表达式进行分组的分类方案。识别有助于分类的特征,并在作者生成的包含2006个单一表达式实例的带注释的数据集上训练基于crf的分类器。在与训练数据不同的测试集(包含300个孟加拉语单表达式实例)上测试分类器的性能。结果表明,该分类器在75%的情况下可以正确分类单一表达式。最后,研究了该分类任务在机器翻译(孟加拉语-英语)中的应用。翻译准确率从39%(由Google翻译)提高到60%(通过提出的方法),并且发现这种改进具有统计学意义。所有带注释的数据集(之前没有)都是免费的,以促进对该主题的进一步研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
One-expression classification in Bengali and its role in Bengali-English machine translation
This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine translation. The characteristics of one-expressions are studied in 177 million word corpus. A classification scheme has been proposed for the grouping the one-expressions. The features contributing towards the classification are identified and a CRF-based classifier is trained on an authors' generated annotated dataset containing 2006 instances of one-expressions. The classifier's performance is tested on a test set (containing 300 instances of Bengali one-expressions) which is different from the training data. Evaluation shows that the classifier can correctly classify the one-expressions in 75% cases. Finally, the utility of this classification task is investigated for machine translation (Bengali-English). The translation accuracy is improved from 39% (by Google translator) to 60% (by the proposed approach) and this improvement is found to be statistically significant. All the annotated datasets (there was none before) are made free to facilitate further research on this topic.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信