{"title":"Automatic detection of English words in Benglish text: A statistical approach","authors":"Bibekananda Kundu, S. Chandra","doi":"10.1109/IHCI.2012.6481827","DOIUrl":null,"url":null,"abstract":"Code-mixing and code-switching create challenges in the field of natural language processing applications like Machine Translation and Speech-to-Speech Translation. Detection of foreign words is very much essential for smooth processing of natural language. A statistical language independent approach for automatic detection of foreign words in mixed language has been introduced in this paper. Initially, the proposed approach has been applied on Benglish text which is combination of Bangla text contains English words. The methodology can be easily adopted for other languages where such code mixing exists. The proposed approach yields an accuracy of 71.82% when tested on sentences collected from Bangla blogs and social networking websites.","PeriodicalId":107245,"journal":{"name":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHCI.2012.6481827","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Code-mixing and code-switching create challenges in the field of natural language processing applications like Machine Translation and Speech-to-Speech Translation. Detection of foreign words is very much essential for smooth processing of natural language. A statistical language independent approach for automatic detection of foreign words in mixed language has been introduced in this paper. Initially, the proposed approach has been applied on Benglish text which is combination of Bangla text contains English words. The methodology can be easily adopted for other languages where such code mixing exists. The proposed approach yields an accuracy of 71.82% when tested on sentences collected from Bangla blogs and social networking websites.