A proposal to identify stakeholders from news for the institutional relationship management activities of an institution based on Named Entity Recognition using BERT

Eric Hans Messias Da Silva, J. Laterza, Marcos Paulo Pereira Da Silva, M. Ladeira
{"title":"A proposal to identify stakeholders from news for the institutional relationship management activities of an institution based on Named Entity Recognition using BERT","authors":"Eric Hans Messias Da Silva, J. Laterza, Marcos Paulo Pereira Da Silva, M. Ladeira","doi":"10.1109/ICMLA52953.2021.00251","DOIUrl":null,"url":null,"abstract":"For an organization’s institutional relationship activities, it is strategic that there is an efficient process of identification and characterization of stakeholders based on available information. Given the increasing volume of data currently available, this strategic process has commonly been supported by information technology solutions, with high potential for the use of data mining techniques such as textual analysis and natural language processing (NLP). In this work we analyzed the possibility of using a mechanism of Named Entity Recognition (NER) based on the use of Bidirectional Encoder Representations from Transformers (BERT) with Conditional Random Field (CRF), which in the future can be used as the stakeholder identification solution as a replacement of the rule based identification. We applied the proposed solution in news dataset to evaluate its performance. The experiment results showed us that pre-trained Portuguese models performed better than Multilingual ones by a good margin of at least 3.43 percentage points on Test Dataset. We also added a post processing Prediction Masking to correct invalid tagging scheme transitions to improve Micro F1 Score in both datasets ranging from 0.38 percentage points to 1.29 percentage points of improvement. Thus, we achieved the objective of improving stakeholder detection by proposing a NER model that far surpasses the naive rules-based approach of current application, which consisted of an exact text match of stakeholders based on a dictionary built manually.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"633 1","pages":"1569-1575"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

For an organization’s institutional relationship activities, it is strategic that there is an efficient process of identification and characterization of stakeholders based on available information. Given the increasing volume of data currently available, this strategic process has commonly been supported by information technology solutions, with high potential for the use of data mining techniques such as textual analysis and natural language processing (NLP). In this work we analyzed the possibility of using a mechanism of Named Entity Recognition (NER) based on the use of Bidirectional Encoder Representations from Transformers (BERT) with Conditional Random Field (CRF), which in the future can be used as the stakeholder identification solution as a replacement of the rule based identification. We applied the proposed solution in news dataset to evaluate its performance. The experiment results showed us that pre-trained Portuguese models performed better than Multilingual ones by a good margin of at least 3.43 percentage points on Test Dataset. We also added a post processing Prediction Masking to correct invalid tagging scheme transitions to improve Micro F1 Score in both datasets ranging from 0.38 percentage points to 1.29 percentage points of improvement. Thus, we achieved the objective of improving stakeholder detection by proposing a NER model that far surpasses the naive rules-based approach of current application, which consisted of an exact text match of stakeholders based on a dictionary built manually.
基于BERT的命名实体识别,从新闻中识别机构关系管理活动的利益相关者
对于一个组织的制度关系活动来说,有一个基于现有信息的识别和描述利益相关者的有效过程是具有战略意义的。鉴于目前可用的数据量不断增加,这一战略过程通常得到信息技术解决方案的支持,具有使用文本分析和自然语言处理(NLP)等数据挖掘技术的巨大潜力。在这项工作中,我们分析了使用基于条件随机场(CRF)的双向编码器表示(BERT)的命名实体识别(NER)机制的可能性,该机制在未来可以用作利益相关者识别解决方案,以替代基于规则的识别。我们将提出的解决方案应用于新闻数据集来评估其性能。实验结果表明,在测试数据集上,预训练的葡萄牙语模型比多语言模型表现得更好,至少高出3.43个百分点。我们还添加了一个后处理预测掩蔽来纠正无效的标记方案转换,以提高两个数据集的Micro F1分数,提高幅度从0.38个百分点到1.29个百分点不等。因此,我们通过提出一个NER模型实现了改进利益相关者检测的目标,该模型远远超过了当前应用程序中基于规则的朴素方法,该方法由基于手动构建的字典的利益相关者的精确文本匹配组成。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信