{"title":"Word Embedding in Nepali Language using Word2Vec","authors":"Bipesh Subedi, Prakash Poudyal","doi":"10.1145/3582768.3582799","DOIUrl":null,"url":null,"abstract":"Word embedding is a technique for understanding the relationship among words by mapping words to numbers. Several kinds of research have been carried out in this field in different languages such as English, Hindi, Bengali etc. but very few works are available in the Nepali language domain. In this work, the word embedding technique using Word2Vec is implemented for Nepali news data. The methodology involved in this work includes Dataset preparation and Word2Vec modelling. Gensim package is used for implementing the Word2Vec model and its output shows the similarity between Nepali words. The work mainly focuses on developing word embedding on Nepali words generated by scraping the health section of Nepali news portals and has shown promising results.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"142 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582768.3582799","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Word embedding is a technique for understanding the relationship among words by mapping words to numbers. Several kinds of research have been carried out in this field in different languages such as English, Hindi, Bengali etc. but very few works are available in the Nepali language domain. In this work, the word embedding technique using Word2Vec is implemented for Nepali news data. The methodology involved in this work includes Dataset preparation and Word2Vec modelling. Gensim package is used for implementing the Word2Vec model and its output shows the similarity between Nepali words. The work mainly focuses on developing word embedding on Nepali words generated by scraping the health section of Nepali news portals and has shown promising results.