{"title":"A Crowd-Source Based Corpus on Bangla to English Translation","authors":"Nafisa Nowshin, Zakia Sultana Ritu, Sabir Ismail","doi":"10.1109/ICCITECHN.2018.8631947","DOIUrl":null,"url":null,"abstract":"In this paper, we present a crowd-source based Bangla to English parallel corpus and evaluate its accuracy. A complete and informative corpus is necessary for any language for its development through automated process. A Bangla to English parallel corpus has importance in various multi-lingual applications and NLP research works. But there is still scarcity of a complete Bangla to English parallel corpus. In this paper we propose a large scale crowd-source method of construction of a Bangla to English parallel corpus through crowd-sourcing. We chose crowd-sourcing method to venture a new approach in corpus construction and evaluate human behavior pattern in doing so. The translations were collected form under graduate students of university to ensure strong language knowledge. A Bangla to English parallel corpus will help in comparing linguistic features of these languages. In this paper we present an initial dataset prepared via crowd-sourcing which will serve as a baseline for further analysis of crowd source based corpus. Our primary dataset is consists of 517 Bangla sentences and for every Bangla sentence, we collected 4 English sentences on an average and 2143 English sentences in total via crowd-sourcing. This data was collected over a period of 2 months and from 62 users. Finally we analyze the dataset and give some conclusive idea about further research.","PeriodicalId":355984,"journal":{"name":"2018 21st International Conference of Computer and Information Technology (ICCIT)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st International Conference of Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2018.8631947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this paper, we present a crowd-source based Bangla to English parallel corpus and evaluate its accuracy. A complete and informative corpus is necessary for any language for its development through automated process. A Bangla to English parallel corpus has importance in various multi-lingual applications and NLP research works. But there is still scarcity of a complete Bangla to English parallel corpus. In this paper we propose a large scale crowd-source method of construction of a Bangla to English parallel corpus through crowd-sourcing. We chose crowd-sourcing method to venture a new approach in corpus construction and evaluate human behavior pattern in doing so. The translations were collected form under graduate students of university to ensure strong language knowledge. A Bangla to English parallel corpus will help in comparing linguistic features of these languages. In this paper we present an initial dataset prepared via crowd-sourcing which will serve as a baseline for further analysis of crowd source based corpus. Our primary dataset is consists of 517 Bangla sentences and for every Bangla sentence, we collected 4 English sentences on an average and 2143 English sentences in total via crowd-sourcing. This data was collected over a period of 2 months and from 62 users. Finally we analyze the dataset and give some conclusive idea about further research.