{"title":"One-expression classification in Bengali and its role in Bengali-English machine translation","authors":"Apurbalal Senapati, Utpal Garain","doi":"10.1109/IALP.2014.6973489","DOIUrl":null,"url":null,"abstract":"This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine translation. The characteristics of one-expressions are studied in 177 million word corpus. A classification scheme has been proposed for the grouping the one-expressions. The features contributing towards the classification are identified and a CRF-based classifier is trained on an authors' generated annotated dataset containing 2006 instances of one-expressions. The classifier's performance is tested on a test set (containing 300 instances of Bengali one-expressions) which is different from the training data. Evaluation shows that the classifier can correctly classify the one-expressions in 75% cases. Finally, the utility of this classification task is investigated for machine translation (Bengali-English). The translation accuracy is improved from 39% (by Google translator) to 60% (by the proposed approach) and this improvement is found to be statistically significant. All the annotated datasets (there was none before) are made free to facilitate further research on this topic.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2014.6973489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine translation. The characteristics of one-expressions are studied in 177 million word corpus. A classification scheme has been proposed for the grouping the one-expressions. The features contributing towards the classification are identified and a CRF-based classifier is trained on an authors' generated annotated dataset containing 2006 instances of one-expressions. The classifier's performance is tested on a test set (containing 300 instances of Bengali one-expressions) which is different from the training data. Evaluation shows that the classifier can correctly classify the one-expressions in 75% cases. Finally, the utility of this classification task is investigated for machine translation (Bengali-English). The translation accuracy is improved from 39% (by Google translator) to 60% (by the proposed approach) and this improvement is found to be statistically significant. All the annotated datasets (there was none before) are made free to facilitate further research on this topic.