Asraf Hossain Patoary, Md. Jahid Bin Kibria, Abdul Kaium
{"title":"自动孟加拉语词性标注器的实现:一种使用深度学习算法的方法","authors":"Asraf Hossain Patoary, Md. Jahid Bin Kibria, Abdul Kaium","doi":"10.1109/TENSYMP50017.2020.9230907","DOIUrl":null,"url":null,"abstract":"Parts-of-Speech(POS) tagging is the technique to assign each word in a sentence as an individual part of speech. POS tagging is the first important step in Natural Language Processing applications (NLP). In some languages, POS tagging works well with higher accuracy, but in the Bengali language, it is still an unsolved problem. The Bengali language is much ambiguous and inflectional, where every word has many more variants based on their suffixes and prefixes. Although developing POS tagging is not new for the Bengali language, we aim to make a highly accurate model with a minimal dataset. Here we developed a deep learning model, and it is mainly based on suffixes, which are parts of Bengali grammar. Moreover, we did experiment with a Bengali corpus that contains 2927 words with their corresponding parts of speech tags. The accuracy of our proposed POS tagging deep learning model is 93.90%. We also included this model as a python package to our open-source Bengali Natural language processing toolkit (BNLTK), which is now live on pipy.org.","PeriodicalId":6721,"journal":{"name":"2020 IEEE Region 10 Symposium (TENSYMP)","volume":"39 1","pages":"308-311"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Implementation of Automated Bengali Parts of Speech Tagger: An Approach Using Deep Learning Algorithm\",\"authors\":\"Asraf Hossain Patoary, Md. Jahid Bin Kibria, Abdul Kaium\",\"doi\":\"10.1109/TENSYMP50017.2020.9230907\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parts-of-Speech(POS) tagging is the technique to assign each word in a sentence as an individual part of speech. POS tagging is the first important step in Natural Language Processing applications (NLP). In some languages, POS tagging works well with higher accuracy, but in the Bengali language, it is still an unsolved problem. The Bengali language is much ambiguous and inflectional, where every word has many more variants based on their suffixes and prefixes. Although developing POS tagging is not new for the Bengali language, we aim to make a highly accurate model with a minimal dataset. Here we developed a deep learning model, and it is mainly based on suffixes, which are parts of Bengali grammar. Moreover, we did experiment with a Bengali corpus that contains 2927 words with their corresponding parts of speech tags. The accuracy of our proposed POS tagging deep learning model is 93.90%. We also included this model as a python package to our open-source Bengali Natural language processing toolkit (BNLTK), which is now live on pipy.org.\",\"PeriodicalId\":6721,\"journal\":{\"name\":\"2020 IEEE Region 10 Symposium (TENSYMP)\",\"volume\":\"39 1\",\"pages\":\"308-311\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Region 10 Symposium (TENSYMP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENSYMP50017.2020.9230907\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Region 10 Symposium (TENSYMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENSYMP50017.2020.9230907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Implementation of Automated Bengali Parts of Speech Tagger: An Approach Using Deep Learning Algorithm
Parts-of-Speech(POS) tagging is the technique to assign each word in a sentence as an individual part of speech. POS tagging is the first important step in Natural Language Processing applications (NLP). In some languages, POS tagging works well with higher accuracy, but in the Bengali language, it is still an unsolved problem. The Bengali language is much ambiguous and inflectional, where every word has many more variants based on their suffixes and prefixes. Although developing POS tagging is not new for the Bengali language, we aim to make a highly accurate model with a minimal dataset. Here we developed a deep learning model, and it is mainly based on suffixes, which are parts of Bengali grammar. Moreover, we did experiment with a Bengali corpus that contains 2927 words with their corresponding parts of speech tags. The accuracy of our proposed POS tagging deep learning model is 93.90%. We also included this model as a python package to our open-source Bengali Natural language processing toolkit (BNLTK), which is now live on pipy.org.