孟加拉语端到端词性标注与命名实体识别

2019 International Conference on Bangla Speech and Language Processing (ICBSLP) Pub Date : 2019-09-01 DOI:10.1109/ICBSLP47725.2019.201541

Jillur Rahman Saurav, Summit Haque, Farida Chowdhury

{"title":"孟加拉语端到端词性标注与命名实体识别","authors":"Jillur Rahman Saurav, Summit Haque, Farida Chowdhury","doi":"10.1109/ICBSLP47725.2019.201541","DOIUrl":null,"url":null,"abstract":"Automatic Parts of Speech(POS) tagging is one of the most fundamental tasks for a language in Natural Language Processing(NLP), which acts as a feature for solving advanced NLP tasks. Named Entity Recognition(NER) is another essential task of NLP for information retrieval. Researchers could not find up to the mark solution yet on these two tasks for Bangla language compared to other languages, for instance, English, Ger-man. Moreover, many solutions heavily depend on handcrafted features that require strong linguistic expertise. As these two sequence labeling tasks are similar, In this work, two different datasets of POS tagging and NER were prepared, and different deep neural network approaches studied for solving these two tasks separately. All of the approaches were end to end and did not need any handcrafted feature like word suffixes or affixes, gazetteers, dictionary. This study came up with an end to end solution using deep neural network-based model consisting of Bi-directional Long short-term memory(BLSTM), Convolutional Neural Network(CNN) and Conditional Random Field(CRF). The proposed model trained on respected datasets achieved an accuracy of 93.86% on POS tagging and a strict f1 score of 0.6285 on NER on prepared datasets, respectively.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"End to End Parts of Speech Tagging and Named Entity Recognition in Bangla Language\",\"authors\":\"Jillur Rahman Saurav, Summit Haque, Farida Chowdhury\",\"doi\":\"10.1109/ICBSLP47725.2019.201541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic Parts of Speech(POS) tagging is one of the most fundamental tasks for a language in Natural Language Processing(NLP), which acts as a feature for solving advanced NLP tasks. Named Entity Recognition(NER) is another essential task of NLP for information retrieval. Researchers could not find up to the mark solution yet on these two tasks for Bangla language compared to other languages, for instance, English, Ger-man. Moreover, many solutions heavily depend on handcrafted features that require strong linguistic expertise. As these two sequence labeling tasks are similar, In this work, two different datasets of POS tagging and NER were prepared, and different deep neural network approaches studied for solving these two tasks separately. All of the approaches were end to end and did not need any handcrafted feature like word suffixes or affixes, gazetteers, dictionary. This study came up with an end to end solution using deep neural network-based model consisting of Bi-directional Long short-term memory(BLSTM), Convolutional Neural Network(CNN) and Conditional Random Field(CRF). The proposed model trained on respected datasets achieved an accuracy of 93.86% on POS tagging and a strict f1 score of 0.6285 on NER on prepared datasets, respectively.\",\"PeriodicalId\":413077,\"journal\":{\"name\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"volume\":\"80 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBSLP47725.2019.201541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBSLP47725.2019.201541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

自动词性标注是自然语言处理(NLP)中最基本的语言任务之一，它是解决高级自然语言处理任务的一个特征。命名实体识别(NER)是面向信息检索的自然语言处理的另一项重要任务。与其他语言(如英语、德语和德语)相比，研究人员在这两项任务上还没有找到符合标准的解决方案。此外，许多解决方案严重依赖于手工制作的功能，这需要很强的语言专业知识。由于这两个序列标注任务具有相似性，因此本工作准备了两个不同的POS标注和NER标注数据集，并分别研究了不同的深度神经网络方法来解决这两个任务。所有的方法都是端到端，不需要任何手工制作的功能，如单词后缀或词缀，地名词典，字典。本研究利用双向长短期记忆(BLSTM)、卷积神经网络(CNN)和条件随机场(CRF)组成的深度神经网络模型，提出了一种端到端解决方案。该模型在尊重数据集上训练后，在POS标注上的准确率为93.86%，在NER上的严格f1分数为0.6285。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

End to End Parts of Speech Tagging and Named Entity Recognition in Bangla Language

Automatic Parts of Speech(POS) tagging is one of the most fundamental tasks for a language in Natural Language Processing(NLP), which acts as a feature for solving advanced NLP tasks. Named Entity Recognition(NER) is another essential task of NLP for information retrieval. Researchers could not find up to the mark solution yet on these two tasks for Bangla language compared to other languages, for instance, English, Ger-man. Moreover, many solutions heavily depend on handcrafted features that require strong linguistic expertise. As these two sequence labeling tasks are similar, In this work, two different datasets of POS tagging and NER were prepared, and different deep neural network approaches studied for solving these two tasks separately. All of the approaches were end to end and did not need any handcrafted feature like word suffixes or affixes, gazetteers, dictionary. This study came up with an end to end solution using deep neural network-based model consisting of Bi-directional Long short-term memory(BLSTM), Convolutional Neural Network(CNN) and Conditional Random Field(CRF). The proposed model trained on respected datasets achieved an accuracy of 93.86% on POS tagging and a strict f1 score of 0.6285 on NER on prepared datasets, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Conference on Bangla Speech and Language Processing (ICBSLP)

自引率

0.00%

发文量