Leveraging Integrated Learning for Open-Domain Chinese Named Entity Recognition

Q2 Decision Sciences
Jin Diao;Zhangbing Zhou;Guangli Shi
{"title":"Leveraging Integrated Learning for Open-Domain Chinese Named Entity Recognition","authors":"Jin Diao;Zhangbing Zhou;Guangli Shi","doi":"10.26599/IJCS.2022.9100015","DOIUrl":null,"url":null,"abstract":"Named entity recognition (NER) is a fundamental technique in natural language processing that provides preconditions for tasks, such as natural language question reasoning, text matching, and semantic text similarity. Compared to English, the challenge of Chinese NER lies in the noise impact caused by the complex meanings, diverse structures, and ambiguous semantic boundaries of the Chinese language itself. At the same time, compared with specific domains, open-domain entity types are more complex and changeable, and the number of entities is considerably larger. Thus, the task of Chinese NER is more difficult. However, existing open-domain NER methods have low recognition rates. Therefore, this paper proposes a method based on the bidirectional long short-term memory conditional random field (BiLSTM-CRF) model, which leverages integrated learning to improve the efficiency of Chinese NER. Compared with single models, including CRF, BiLSTM-CRF, and gated recurrent unit-CRF, the proposed method can significantly improve the accuracy of open-domain Chinese NER.","PeriodicalId":32381,"journal":{"name":"International Journal of Crowd Science","volume":"6 2","pages":"74-79"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9736195/9815841/09815847.pdf","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Crowd Science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9815847/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 3

Abstract

Named entity recognition (NER) is a fundamental technique in natural language processing that provides preconditions for tasks, such as natural language question reasoning, text matching, and semantic text similarity. Compared to English, the challenge of Chinese NER lies in the noise impact caused by the complex meanings, diverse structures, and ambiguous semantic boundaries of the Chinese language itself. At the same time, compared with specific domains, open-domain entity types are more complex and changeable, and the number of entities is considerably larger. Thus, the task of Chinese NER is more difficult. However, existing open-domain NER methods have low recognition rates. Therefore, this paper proposes a method based on the bidirectional long short-term memory conditional random field (BiLSTM-CRF) model, which leverages integrated learning to improve the efficiency of Chinese NER. Compared with single models, including CRF, BiLSTM-CRF, and gated recurrent unit-CRF, the proposed method can significantly improve the accuracy of open-domain Chinese NER.
利用集成学习实现开放域中文命名实体识别
命名实体识别是自然语言处理中的一项基本技术,它为自然语言问题推理、文本匹配和语义文本相似性等任务提供了前提条件。与英语相比,汉语NER的挑战在于汉语本身含义复杂、结构多样、语义界限模糊所带来的噪音影响。同时,与特定域相比,开放域实体类型更加复杂多变,实体数量也要多得多。因此,中国净入学率的任务更加艰巨。然而,现有的开放域NER方法的识别率较低。因此,本文提出了一种基于双向长短期记忆条件随机场(BiLSTM-CRF)模型的方法,该方法利用集成学习来提高汉语NER的效率。与CRF、BiLSTM-CRF和门控递归单元CRF等单一模型相比,该方法可以显著提高开放域中文NER的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Crowd Science
International Journal of Crowd Science Decision Sciences-Decision Sciences (miscellaneous)
CiteScore
2.70
自引率
0.00%
发文量
20
审稿时长
24 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信