{"title":"Information Retrieval from Alternative Data using Zero-Shot Self-Supervised Learning","authors":"A. Assareh","doi":"10.1109/CIFEr52523.2022.9776094","DOIUrl":null,"url":null,"abstract":"Traditionally, in the financial services industry, a large amount of financial analysts’ time is spent on knowledge discovery and extraction from different unstructured data sources, such as reports, research notes, SEC filings, earnings call transcripts, news etc. In addition to inefficiency, this manual information retrieval process can be prone to human error, subjectivity, and inconsistency. Recent advances in representation learning provide a reliable platform for mapping a large volume of unstructured data to a high dimensional vector space where similarities and differences between data points can be quantified and used for featurization, pattern recognition and information retrieval. In this work we demonstrate that by properly representing terms, documents and companies in the same informative vector space and applying a simple self-supervised learning framework, relevant companies and documents can be retrieved with a good level of accuracy given the topics of interest, even with no prior labeled data.","PeriodicalId":234473,"journal":{"name":"2022 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIFEr52523.2022.9776094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Traditionally, in the financial services industry, a large amount of financial analysts’ time is spent on knowledge discovery and extraction from different unstructured data sources, such as reports, research notes, SEC filings, earnings call transcripts, news etc. In addition to inefficiency, this manual information retrieval process can be prone to human error, subjectivity, and inconsistency. Recent advances in representation learning provide a reliable platform for mapping a large volume of unstructured data to a high dimensional vector space where similarities and differences between data points can be quantified and used for featurization, pattern recognition and information retrieval. In this work we demonstrate that by properly representing terms, documents and companies in the same informative vector space and applying a simple self-supervised learning framework, relevant companies and documents can be retrieved with a good level of accuracy given the topics of interest, even with no prior labeled data.