{"title":"Text-based Language Identifier using Multinomial Naïve Bayes Algorithm","authors":"S. Rawat, Lakshita Werulkar, Sagarika Jaywant","doi":"10.47164/ijngc.v14i1.1024","DOIUrl":null,"url":null,"abstract":"Language Identification is among the crucial steps in any NLP based application. Text - based documents and webpages are rapidly increasing in the modern Internet. It is simple to locate documents written in different languages from all across the world that are available with just one click. Therefore, a language identifier is absolutely necessary in order to help the user interpret the content. Language identification has so far tended to be more concentrated on European languages and is still rather limited for Indian Traditional Languages. Many researchers have become more interested in the study of language identification for similar languages from popular languages. In this paper, Multinomial Na¨ıve Bayes Algorithm is used for detecting languages in Devanagari like Marathi, Sanskrit and Hindi, and three European languages French, Italian and English. An experiment done ondatasets of each language has produced satisfactorily accurate results after training and testing the model.","PeriodicalId":42021,"journal":{"name":"International Journal of Next-Generation Computing","volume":"110 1","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2023-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Next-Generation Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47164/ijngc.v14i1.1024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Language Identification is among the crucial steps in any NLP based application. Text - based documents and webpages are rapidly increasing in the modern Internet. It is simple to locate documents written in different languages from all across the world that are available with just one click. Therefore, a language identifier is absolutely necessary in order to help the user interpret the content. Language identification has so far tended to be more concentrated on European languages and is still rather limited for Indian Traditional Languages. Many researchers have become more interested in the study of language identification for similar languages from popular languages. In this paper, Multinomial Na¨ıve Bayes Algorithm is used for detecting languages in Devanagari like Marathi, Sanskrit and Hindi, and three European languages French, Italian and English. An experiment done ondatasets of each language has produced satisfactorily accurate results after training and testing the model.