Multi-representation approach to text regression of financial risks

2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT) Pub Date : 2015-11-01 DOI:10.1109/AINL-ISMW-FRUCT.2015.7382979

Roman Trusov, Alexey Natekin, Pavel Kalaidin, Sergey Ovcharenko, A. Knoll, Aida Fazylova

{"title":"Multi-representation approach to text regression of financial risks","authors":"Roman Trusov, Alexey Natekin, Pavel Kalaidin, Sergey Ovcharenko, A. Knoll, Aida Fazylova","doi":"10.1109/AINL-ISMW-FRUCT.2015.7382979","DOIUrl":null,"url":null,"abstract":"Different approaches for textual feature extraction have been proposed starting with simple word count features and continuing with deeper representations capturing distributional semantics. In recent publications word embedding methods have been successfully used as a representation basis for a large number of NLP tasks like text classification, part of speech tagging and many others. In this article we explore opportunities of using multiple text representations simultaneously within one regression task in order to exploit conventional bag of words approach with the more semantically rich embeddings. We investigate performance of this multi-representation approach on the financial risk prediction problem. Publicly available 10-K reports filled by US trading companies are used as the basis for predicting next year change in stock price volatility. Our study shows that models based on single representations achieve performance that is comparable to the previously published results on risk prediction and models with multiple representations benefit from complementary information and outperform both baseline and single representation models.","PeriodicalId":122232,"journal":{"name":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Different approaches for textual feature extraction have been proposed starting with simple word count features and continuing with deeper representations capturing distributional semantics. In recent publications word embedding methods have been successfully used as a representation basis for a large number of NLP tasks like text classification, part of speech tagging and many others. In this article we explore opportunities of using multiple text representations simultaneously within one regression task in order to exploit conventional bag of words approach with the more semantically rich embeddings. We investigate performance of this multi-representation approach on the financial risk prediction problem. Publicly available 10-K reports filled by US trading companies are used as the basis for predicting next year change in stock price volatility. Our study shows that models based on single representations achieve performance that is comparable to the previously published results on risk prediction and models with multiple representations benefit from complementary information and outperform both baseline and single representation models.

查看原文本刊更多论文

金融风险文本回归的多表征方法

已经提出了不同的文本特征提取方法，从简单的词计数特征开始，继续使用捕获分布语义的更深层表示。在最近的出版物中，词嵌入方法已经成功地用作大量NLP任务的表示基础，如文本分类、词性标注等。在本文中，我们探索了在一个回归任务中同时使用多个文本表示的机会，以便利用具有更丰富语义嵌入的传统词袋方法。我们研究了这种多表示方法在金融风险预测问题上的性能。由美国贸易公司填写的公开的10-K报告被用作预测明年股价波动变化的基础。我们的研究表明，基于单一表征的模型在风险预测方面的表现与之前发表的结果相当，而具有多个表征的模型受益于互补信息，并且优于基线模型和单一表征模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)

自引率

0.00%

发文量