Multi-Entity Polarity Analysis in Financial Documents

Brazilian Symposium on Multimedia and the Web Pub Date : 2014-11-18 DOI:10.1145/2664551.2664574

Javier Zambrano Ferreira, Josiane Rodrigues, Marco Cristo, D. Oliveira

{"title":"Multi-Entity Polarity Analysis in Financial Documents","authors":"Javier Zambrano Ferreira, Josiane Rodrigues, Marco Cristo, D. Oliveira","doi":"10.1145/2664551.2664574","DOIUrl":null,"url":null,"abstract":"The amount of information available in the Internet does not allow performing manual content analysis to identify information of interest. Thus automated analyses are used to identify information of interest, and one increasingly important approach is the polarity analysis. Polarity analysis is the classification of a text document in positive, negative, and neutral, according to a certain topic. This classification of information is particularly useful in the finance domain, where news about a company can affect the performance of its stocks. Although most of the methods in financial domain consider that the whole document is associated with a particular entity, this is not always the case. In fact, it is common that authors cite several entities in a single document and these entities are cited with different polarity. Accordingly, the objective of this paper was to study strategies for polarity detection in financial documents with multiple entities. Specifically, we studied methods based on learning of multiple models, one for each observed entity, using SVM classifiers. We evaluated models based on the partition of documents into fragments according to the entities they cite. We used several heuristics to segment documents based on shallow and deep natural language processing (NLP). We found that entity-specific models created by partitioning the document collection into segments outperformed the strategy based on the use of entire documents. We also observed that more complex segmentation using anaphora resolution was not able to outperform a low-cost approach, based on simple string matching.","PeriodicalId":114454,"journal":{"name":"Brazilian Symposium on Multimedia and the Web","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brazilian Symposium on Multimedia and the Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2664551.2664574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

The amount of information available in the Internet does not allow performing manual content analysis to identify information of interest. Thus automated analyses are used to identify information of interest, and one increasingly important approach is the polarity analysis. Polarity analysis is the classification of a text document in positive, negative, and neutral, according to a certain topic. This classification of information is particularly useful in the finance domain, where news about a company can affect the performance of its stocks. Although most of the methods in financial domain consider that the whole document is associated with a particular entity, this is not always the case. In fact, it is common that authors cite several entities in a single document and these entities are cited with different polarity. Accordingly, the objective of this paper was to study strategies for polarity detection in financial documents with multiple entities. Specifically, we studied methods based on learning of multiple models, one for each observed entity, using SVM classifiers. We evaluated models based on the partition of documents into fragments according to the entities they cite. We used several heuristics to segment documents based on shallow and deep natural language processing (NLP). We found that entity-specific models created by partitioning the document collection into segments outperformed the strategy based on the use of entire documents. We also observed that more complex segmentation using anaphora resolution was not able to outperform a low-cost approach, based on simple string matching.

查看原文本刊更多论文

财务文件中的多主体极性分析

Internet上可用的信息量不允许执行手动内容分析来识别感兴趣的信息。因此，自动化分析被用于识别感兴趣的信息，其中一个日益重要的方法是极性分析。极性分析是根据某个主题对文本文档进行正面、负面和中性的分类。这种信息分类在金融领域特别有用，因为有关公司的新闻会影响其股票的表现。虽然财务领域的大多数方法都认为整个文件与特定实体相关联，但情况并非总是如此。事实上，作者在一个文档中引用几个实体是很常见的，这些实体以不同的极性被引用。因此，本文的目的是研究具有多主体的财务文件的极性检测策略。具体来说，我们研究了基于学习多个模型的方法，每个模型对应一个观察实体，使用SVM分类器。我们根据文档引用的实体将文档划分为片段来评估模型。我们使用了几种基于浅层和深层自然语言处理(NLP)的启发式方法来分割文档。我们发现，通过将文档集合划分为段创建的特定于实体的模型优于基于使用整个文档的策略。我们还观察到，使用回指解析的更复杂的分割无法优于基于简单字符串匹配的低成本方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Brazilian Symposium on Multimedia and the Web

自引率

0.00%

发文量