基于异构图网络的中文金融领域嵌套实体识别方法

IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Han Zhang , Yiping Dang , Yazhou Zhang , Siyuan Liang , Junxiu Liu , Lixia Ji
{"title":"基于异构图网络的中文金融领域嵌套实体识别方法","authors":"Han Zhang ,&nbsp;Yiping Dang ,&nbsp;Yazhou Zhang ,&nbsp;Siyuan Liang ,&nbsp;Junxiu Liu ,&nbsp;Lixia Ji","doi":"10.1016/j.ipm.2024.103812","DOIUrl":null,"url":null,"abstract":"<div><p>In the finance domain, nested named entities recognition has become a hot topic in named entity recognition tasks. Traditional nested entity recognition methods easily ignore the dependency relationships between entities, and these methods are mostly suitable for English general domain. Therefore, we propose a Chinese nested entity recognition method for the finance domain based on heterogeneous graph network(HGFNER). This method consists of two parts: the boundary division model of candidate entities and the internal relationship graph model of candidate entities. First, the boundary division model of candidate entities that introduces expert knowledge is used to partition the flat entities contained in the text and segment the text to address issues such as long entity boundaries and strong domain features in the Chinese finance domain. Then, by using heterogeneous graphs to represent the internal structure of entities from both spatial and syntactic dependencies to achieve the goal of learning dependency relationships between entities from multiple perspectives. Meanwhile, so as not to affect the operational efficiency of the model, we also propose a fast matching algorithm DAAC_BM for n-gram sequences in domain dictionaries to solve the problems of memory overflow and space waste faced by multi-pattern fast matching algorithms in Chinese matching. In addition, we propose a Chinese nested entity dataset CFNE for the financial field, which, as far as we know, is the first publicly available annotated dataset in the field. HGFNER achieves state-of-the-art macro-F1 value on CFNE, reaching 86.41%.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Chinese nested entity recognition method for the finance domain based on heterogeneous graph network\",\"authors\":\"Han Zhang ,&nbsp;Yiping Dang ,&nbsp;Yazhou Zhang ,&nbsp;Siyuan Liang ,&nbsp;Junxiu Liu ,&nbsp;Lixia Ji\",\"doi\":\"10.1016/j.ipm.2024.103812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In the finance domain, nested named entities recognition has become a hot topic in named entity recognition tasks. Traditional nested entity recognition methods easily ignore the dependency relationships between entities, and these methods are mostly suitable for English general domain. Therefore, we propose a Chinese nested entity recognition method for the finance domain based on heterogeneous graph network(HGFNER). This method consists of two parts: the boundary division model of candidate entities and the internal relationship graph model of candidate entities. First, the boundary division model of candidate entities that introduces expert knowledge is used to partition the flat entities contained in the text and segment the text to address issues such as long entity boundaries and strong domain features in the Chinese finance domain. Then, by using heterogeneous graphs to represent the internal structure of entities from both spatial and syntactic dependencies to achieve the goal of learning dependency relationships between entities from multiple perspectives. Meanwhile, so as not to affect the operational efficiency of the model, we also propose a fast matching algorithm DAAC_BM for n-gram sequences in domain dictionaries to solve the problems of memory overflow and space waste faced by multi-pattern fast matching algorithms in Chinese matching. In addition, we propose a Chinese nested entity dataset CFNE for the financial field, which, as far as we know, is the first publicly available annotated dataset in the field. HGFNER achieves state-of-the-art macro-F1 value on CFNE, reaching 86.41%.</p></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457324001717\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001717","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

在金融领域,嵌套命名实体识别已成为命名实体识别任务中的热门话题。传统的嵌套实体识别方法容易忽略实体之间的依赖关系,而且这些方法大多适用于英语通用领域。因此,我们提出了一种基于异构图网络(HGFNER)的金融领域中文嵌套实体识别方法。该方法由两部分组成:候选实体的边界划分模型和候选实体的内部关系图模型。首先,通过引入专家知识的候选实体边界划分模型,对文本中包含的平面实体进行划分,并针对中文金融领域实体边界长、领域特征强等问题对文本进行分割。然后,利用异构图从空间依赖和句法依赖两方面来表示实体的内部结构,实现从多角度学习实体间依赖关系的目标。同时,为了不影响模型的运行效率,我们还提出了针对领域词典中 n-gram 序列的快速匹配算法 DAAC_BM,以解决中文匹配中多模式快速匹配算法面临的内存溢出和空间浪费问题。此外,我们还提出了金融领域的中文嵌套实体数据集 CFNE,据我们所知,这是金融领域第一个公开的注释数据集。HGFNER 在 CFNE 上实现了最先进的宏 F1 值,达到了 86.41%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Chinese nested entity recognition method for the finance domain based on heterogeneous graph network

In the finance domain, nested named entities recognition has become a hot topic in named entity recognition tasks. Traditional nested entity recognition methods easily ignore the dependency relationships between entities, and these methods are mostly suitable for English general domain. Therefore, we propose a Chinese nested entity recognition method for the finance domain based on heterogeneous graph network(HGFNER). This method consists of two parts: the boundary division model of candidate entities and the internal relationship graph model of candidate entities. First, the boundary division model of candidate entities that introduces expert knowledge is used to partition the flat entities contained in the text and segment the text to address issues such as long entity boundaries and strong domain features in the Chinese finance domain. Then, by using heterogeneous graphs to represent the internal structure of entities from both spatial and syntactic dependencies to achieve the goal of learning dependency relationships between entities from multiple perspectives. Meanwhile, so as not to affect the operational efficiency of the model, we also propose a fast matching algorithm DAAC_BM for n-gram sequences in domain dictionaries to solve the problems of memory overflow and space waste faced by multi-pattern fast matching algorithms in Chinese matching. In addition, we propose a Chinese nested entity dataset CFNE for the financial field, which, as far as we know, is the first publicly available annotated dataset in the field. HGFNER achieves state-of-the-art macro-F1 value on CFNE, reaching 86.41%.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信