Multi-Entity Polarity Analysis in Financial Documents

Javier Zambrano Ferreira, Josiane Rodrigues, Marco Cristo, D. Oliveira
{"title":"Multi-Entity Polarity Analysis in Financial Documents","authors":"Javier Zambrano Ferreira, Josiane Rodrigues, Marco Cristo, D. Oliveira","doi":"10.1145/2664551.2664574","DOIUrl":null,"url":null,"abstract":"The amount of information available in the Internet does not allow performing manual content analysis to identify information of interest. Thus automated analyses are used to identify information of interest, and one increasingly important approach is the polarity analysis. Polarity analysis is the classification of a text document in positive, negative, and neutral, according to a certain topic. This classification of information is particularly useful in the finance domain, where news about a company can affect the performance of its stocks. Although most of the methods in financial domain consider that the whole document is associated with a particular entity, this is not always the case. In fact, it is common that authors cite several entities in a single document and these entities are cited with different polarity. Accordingly, the objective of this paper was to study strategies for polarity detection in financial documents with multiple entities. Specifically, we studied methods based on learning of multiple models, one for each observed entity, using SVM classifiers. We evaluated models based on the partition of documents into fragments according to the entities they cite. We used several heuristics to segment documents based on shallow and deep natural language processing (NLP). We found that entity-specific models created by partitioning the document collection into segments outperformed the strategy based on the use of entire documents. We also observed that more complex segmentation using anaphora resolution was not able to outperform a low-cost approach, based on simple string matching.","PeriodicalId":114454,"journal":{"name":"Brazilian Symposium on Multimedia and the Web","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brazilian Symposium on Multimedia and the Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2664551.2664574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

The amount of information available in the Internet does not allow performing manual content analysis to identify information of interest. Thus automated analyses are used to identify information of interest, and one increasingly important approach is the polarity analysis. Polarity analysis is the classification of a text document in positive, negative, and neutral, according to a certain topic. This classification of information is particularly useful in the finance domain, where news about a company can affect the performance of its stocks. Although most of the methods in financial domain consider that the whole document is associated with a particular entity, this is not always the case. In fact, it is common that authors cite several entities in a single document and these entities are cited with different polarity. Accordingly, the objective of this paper was to study strategies for polarity detection in financial documents with multiple entities. Specifically, we studied methods based on learning of multiple models, one for each observed entity, using SVM classifiers. We evaluated models based on the partition of documents into fragments according to the entities they cite. We used several heuristics to segment documents based on shallow and deep natural language processing (NLP). We found that entity-specific models created by partitioning the document collection into segments outperformed the strategy based on the use of entire documents. We also observed that more complex segmentation using anaphora resolution was not able to outperform a low-cost approach, based on simple string matching.
财务文件中的多主体极性分析
Internet上可用的信息量不允许执行手动内容分析来识别感兴趣的信息。因此,自动化分析被用于识别感兴趣的信息,其中一个日益重要的方法是极性分析。极性分析是根据某个主题对文本文档进行正面、负面和中性的分类。这种信息分类在金融领域特别有用,因为有关公司的新闻会影响其股票的表现。虽然财务领域的大多数方法都认为整个文件与特定实体相关联,但情况并非总是如此。事实上,作者在一个文档中引用几个实体是很常见的,这些实体以不同的极性被引用。因此,本文的目的是研究具有多主体的财务文件的极性检测策略。具体来说,我们研究了基于学习多个模型的方法,每个模型对应一个观察实体,使用SVM分类器。我们根据文档引用的实体将文档划分为片段来评估模型。我们使用了几种基于浅层和深层自然语言处理(NLP)的启发式方法来分割文档。我们发现,通过将文档集合划分为段创建的特定于实体的模型优于基于使用整个文档的策略。我们还观察到,使用回指解析的更复杂的分割无法优于基于简单字符串匹配的低成本方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信