基于词首字母特征的壮语命名实体识别

Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence Pub Date : 2022-03-04 DOI:10.1145/3529466.3529478

Weiquan Zhang, Suqin Tang, Danni He, Tinghui Li, Changchun Pan

{"title":"基于词首字母特征的壮语命名实体识别","authors":"Weiquan Zhang, Suqin Tang, Danni He, Tinghui Li, Changchun Pan","doi":"10.1145/3529466.3529478","DOIUrl":null,"url":null,"abstract":"Named entity recognition is an important task and basis for the intelligent information processing and knowledge representation learning of the Zhuang Language. A BilSTM-CNN-CRF network model combining the uppercase and lowercase characters of words is proposed to be applied to the named entity recognition task of the Zhuang language, which lacks corpus for named entity labeling. Firstly, word2vec is used to train in unmarked Zhuang text to get the word vector of the Zhuang language. Then convolutional neural network is used to extract the character features of Zhuang words, and the character feature vector is obtained. The above two vectors were connected with the initial case feature vectors, which are randomly generated, and then the connected vectors were input into a BilSTM-CNN-CRF model for training; thus, the end-to-end named entity recognition model of Zhuang language was constructed. Experimental results show that, without relying on artificial features and external dictionaries, the proposed method in this study is superior to contrastive models by achieving an 80.37% F1 value in the named entity recognition task, which leads to the realization of automated named entity recognition of Zhuang language.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Named Entity Recognition of Zhuang Language Based on the Feature of Initial Letter in Word\",\"authors\":\"Weiquan Zhang, Suqin Tang, Danni He, Tinghui Li, Changchun Pan\",\"doi\":\"10.1145/3529466.3529478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Named entity recognition is an important task and basis for the intelligent information processing and knowledge representation learning of the Zhuang Language. A BilSTM-CNN-CRF network model combining the uppercase and lowercase characters of words is proposed to be applied to the named entity recognition task of the Zhuang language, which lacks corpus for named entity labeling. Firstly, word2vec is used to train in unmarked Zhuang text to get the word vector of the Zhuang language. Then convolutional neural network is used to extract the character features of Zhuang words, and the character feature vector is obtained. The above two vectors were connected with the initial case feature vectors, which are randomly generated, and then the connected vectors were input into a BilSTM-CNN-CRF model for training; thus, the end-to-end named entity recognition model of Zhuang language was constructed. Experimental results show that, without relying on artificial features and external dictionaries, the proposed method in this study is superior to contrastive models by achieving an 80.37% F1 value in the named entity recognition task, which leads to the realization of automated named entity recognition of Zhuang language.\",\"PeriodicalId\":375562,\"journal\":{\"name\":\"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3529466.3529478\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529466.3529478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

命名实体识别是壮语智能信息处理和知识表示学习的重要任务和基础。提出了一种结合单词大小写字符的BilSTM-CNN-CRF网络模型，并将其应用于缺少命名实体标注语料库的壮语命名实体识别任务。首先，使用word2vec对未标记的壮语文本进行训练，得到壮语的词向量;然后利用卷积神经网络提取壮词的特征，得到壮词的特征向量;将上述两个向量与随机生成的初始案例特征向量连接，然后将连接的向量输入到BilSTM-CNN-CRF模型中进行训练;从而构建了壮语端到端的命名实体识别模型。实验结果表明，在不依赖人工特征和外部词典的情况下，本文提出的方法在命名实体识别任务中达到了80.37%的F1值，优于对比模型，实现了庄语命名实体的自动识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Named Entity Recognition of Zhuang Language Based on the Feature of Initial Letter in Word

Named entity recognition is an important task and basis for the intelligent information processing and knowledge representation learning of the Zhuang Language. A BilSTM-CNN-CRF network model combining the uppercase and lowercase characters of words is proposed to be applied to the named entity recognition task of the Zhuang language, which lacks corpus for named entity labeling. Firstly, word2vec is used to train in unmarked Zhuang text to get the word vector of the Zhuang language. Then convolutional neural network is used to extract the character features of Zhuang words, and the character feature vector is obtained. The above two vectors were connected with the initial case feature vectors, which are randomly generated, and then the connected vectors were input into a BilSTM-CNN-CRF model for training; thus, the end-to-end named entity recognition model of Zhuang language was constructed. Experimental results show that, without relying on artificial features and external dictionaries, the proposed method in this study is superior to contrastive models by achieving an 80.37% F1 value in the named entity recognition task, which leads to the realization of automated named entity recognition of Zhuang language.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence

自引率

0.00%

发文量