{"title":"基于群体源的孟加拉语英译语料库研究","authors":"Nafisa Nowshin, Zakia Sultana Ritu, Sabir Ismail","doi":"10.1109/ICCITECHN.2018.8631947","DOIUrl":null,"url":null,"abstract":"In this paper, we present a crowd-source based Bangla to English parallel corpus and evaluate its accuracy. A complete and informative corpus is necessary for any language for its development through automated process. A Bangla to English parallel corpus has importance in various multi-lingual applications and NLP research works. But there is still scarcity of a complete Bangla to English parallel corpus. In this paper we propose a large scale crowd-source method of construction of a Bangla to English parallel corpus through crowd-sourcing. We chose crowd-sourcing method to venture a new approach in corpus construction and evaluate human behavior pattern in doing so. The translations were collected form under graduate students of university to ensure strong language knowledge. A Bangla to English parallel corpus will help in comparing linguistic features of these languages. In this paper we present an initial dataset prepared via crowd-sourcing which will serve as a baseline for further analysis of crowd source based corpus. Our primary dataset is consists of 517 Bangla sentences and for every Bangla sentence, we collected 4 English sentences on an average and 2143 English sentences in total via crowd-sourcing. This data was collected over a period of 2 months and from 62 users. Finally we analyze the dataset and give some conclusive idea about further research.","PeriodicalId":355984,"journal":{"name":"2018 21st International Conference of Computer and Information Technology (ICCIT)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Crowd-Source Based Corpus on Bangla to English Translation\",\"authors\":\"Nafisa Nowshin, Zakia Sultana Ritu, Sabir Ismail\",\"doi\":\"10.1109/ICCITECHN.2018.8631947\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a crowd-source based Bangla to English parallel corpus and evaluate its accuracy. A complete and informative corpus is necessary for any language for its development through automated process. A Bangla to English parallel corpus has importance in various multi-lingual applications and NLP research works. But there is still scarcity of a complete Bangla to English parallel corpus. In this paper we propose a large scale crowd-source method of construction of a Bangla to English parallel corpus through crowd-sourcing. We chose crowd-sourcing method to venture a new approach in corpus construction and evaluate human behavior pattern in doing so. The translations were collected form under graduate students of university to ensure strong language knowledge. A Bangla to English parallel corpus will help in comparing linguistic features of these languages. In this paper we present an initial dataset prepared via crowd-sourcing which will serve as a baseline for further analysis of crowd source based corpus. Our primary dataset is consists of 517 Bangla sentences and for every Bangla sentence, we collected 4 English sentences on an average and 2143 English sentences in total via crowd-sourcing. This data was collected over a period of 2 months and from 62 users. Finally we analyze the dataset and give some conclusive idea about further research.\",\"PeriodicalId\":355984,\"journal\":{\"name\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCITECHN.2018.8631947\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st International Conference of Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2018.8631947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Crowd-Source Based Corpus on Bangla to English Translation
In this paper, we present a crowd-source based Bangla to English parallel corpus and evaluate its accuracy. A complete and informative corpus is necessary for any language for its development through automated process. A Bangla to English parallel corpus has importance in various multi-lingual applications and NLP research works. But there is still scarcity of a complete Bangla to English parallel corpus. In this paper we propose a large scale crowd-source method of construction of a Bangla to English parallel corpus through crowd-sourcing. We chose crowd-sourcing method to venture a new approach in corpus construction and evaluate human behavior pattern in doing so. The translations were collected form under graduate students of university to ensure strong language knowledge. A Bangla to English parallel corpus will help in comparing linguistic features of these languages. In this paper we present an initial dataset prepared via crowd-sourcing which will serve as a baseline for further analysis of crowd source based corpus. Our primary dataset is consists of 517 Bangla sentences and for every Bangla sentence, we collected 4 English sentences on an average and 2143 English sentences in total via crowd-sourcing. This data was collected over a period of 2 months and from 62 users. Finally we analyze the dataset and give some conclusive idea about further research.