A proposal to identify stakeholders from news for the institutional relationship management activities of an institution based on Named Entity Recognition using BERT
Eric Hans Messias Da Silva, J. Laterza, Marcos Paulo Pereira Da Silva, M. Ladeira
{"title":"A proposal to identify stakeholders from news for the institutional relationship management activities of an institution based on Named Entity Recognition using BERT","authors":"Eric Hans Messias Da Silva, J. Laterza, Marcos Paulo Pereira Da Silva, M. Ladeira","doi":"10.1109/ICMLA52953.2021.00251","DOIUrl":null,"url":null,"abstract":"For an organization’s institutional relationship activities, it is strategic that there is an efficient process of identification and characterization of stakeholders based on available information. Given the increasing volume of data currently available, this strategic process has commonly been supported by information technology solutions, with high potential for the use of data mining techniques such as textual analysis and natural language processing (NLP). In this work we analyzed the possibility of using a mechanism of Named Entity Recognition (NER) based on the use of Bidirectional Encoder Representations from Transformers (BERT) with Conditional Random Field (CRF), which in the future can be used as the stakeholder identification solution as a replacement of the rule based identification. We applied the proposed solution in news dataset to evaluate its performance. The experiment results showed us that pre-trained Portuguese models performed better than Multilingual ones by a good margin of at least 3.43 percentage points on Test Dataset. We also added a post processing Prediction Masking to correct invalid tagging scheme transitions to improve Micro F1 Score in both datasets ranging from 0.38 percentage points to 1.29 percentage points of improvement. Thus, we achieved the objective of improving stakeholder detection by proposing a NER model that far surpasses the naive rules-based approach of current application, which consisted of an exact text match of stakeholders based on a dictionary built manually.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"633 1","pages":"1569-1575"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
For an organization’s institutional relationship activities, it is strategic that there is an efficient process of identification and characterization of stakeholders based on available information. Given the increasing volume of data currently available, this strategic process has commonly been supported by information technology solutions, with high potential for the use of data mining techniques such as textual analysis and natural language processing (NLP). In this work we analyzed the possibility of using a mechanism of Named Entity Recognition (NER) based on the use of Bidirectional Encoder Representations from Transformers (BERT) with Conditional Random Field (CRF), which in the future can be used as the stakeholder identification solution as a replacement of the rule based identification. We applied the proposed solution in news dataset to evaluate its performance. The experiment results showed us that pre-trained Portuguese models performed better than Multilingual ones by a good margin of at least 3.43 percentage points on Test Dataset. We also added a post processing Prediction Masking to correct invalid tagging scheme transitions to improve Micro F1 Score in both datasets ranging from 0.38 percentage points to 1.29 percentage points of improvement. Thus, we achieved the objective of improving stakeholder detection by proposing a NER model that far surpasses the naive rules-based approach of current application, which consisted of an exact text match of stakeholders based on a dictionary built manually.