{"title":"Extracting Knowledge Graphs from Financial Filings: Extended Abstract","authors":"J. Pujara","doi":"10.1145/3077240.3077246","DOIUrl":"https://doi.org/10.1145/3077240.3077246","url":null,"abstract":"Textual corpora, such as financial documents, contain a wealth of knowledge. Recently, knowledge graphs have become a popular approach to capturing structured knowledge of entities and their interrelationships. In this paper, we evaluate open information extraction (IE) and knowledge graph construction techniques for assessing the relevance of textual segments in the Financial Entity Identification and Information Integration Challenge. Our approach is to extract several textual signals, including topics and open IE triples, and combine these in a probabilistic framework to predict the relevance of each potential relationship.","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"290 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116401966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Raschid, D. Burdick, M. Flood, John Grant, J. Langsam, I. Soboroff
{"title":"Financial Entity Identification and Information Integration (FEIII) 2017 Challenge: The Report of the Organizing Committee","authors":"L. Raschid, D. Burdick, M. Flood, John Grant, J. Langsam, I. Soboroff","doi":"10.1145/3077240.3077248","DOIUrl":"https://doi.org/10.1145/3077240.3077248","url":null,"abstract":"This report presents the goals and outcomes of the 2017 Financial Entity Identification and Information Integration (FEIII) Challenge. We describe the dataset and challenge task and the protocol to create labeled data. The report summarizes the process, outcomes and plans for the 2018 Challenge.","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117116473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Denys Proux, Claude Roux, Ágnes Sándor, Julien Perez
{"title":"Hybrid Feature Factored System for Scoring Extracted Passage Relevance in Regulatory Filings","authors":"Denys Proux, Claude Roux, Ágnes Sándor, Julien Perez","doi":"10.1145/3077240.3077251","DOIUrl":"https://doi.org/10.1145/3077240.3077251","url":null,"abstract":"We report in this paper our contribution to the FEIII 2017 challenge addressing relevance ranking of passages extracted from 10-K and 10-Q regulatory filings. We leveraged our previous work on document structure and content analysis for regulatory filings to train hybrid text analytics and decision making models. We designed and trained several layers of classifiers fed with linguistic and semantic features to improve relevance prediction. We discuss in this paper our experiments and results on the competition data set.","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132764825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FactSet: The Advantage of Scored Data","authors":"R. Hicks","doi":"10.1145/3077240.3077256","DOIUrl":"https://doi.org/10.1145/3077240.3077256","url":null,"abstract":"","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129670224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing Features for Ranking Relationships Between Financial Entities based on Text","authors":"Tim Repke, M. Loster, Ralf Krestel","doi":"10.1145/3077240.3077252","DOIUrl":"https://doi.org/10.1145/3077240.3077252","url":null,"abstract":"","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129856935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenguang Wang, D. Burdick, Laura Chiticariu, R. Krishnamurthy, Yunyao Li, Huaiyu Zhu
{"title":"Towards Re-defining Relation Understanding in Financial Domain","authors":"Chenguang Wang, D. Burdick, Laura Chiticariu, R. Krishnamurthy, Yunyao Li, Huaiyu Zhu","doi":"10.1145/3077240.3077254","DOIUrl":"https://doi.org/10.1145/3077240.3077254","url":null,"abstract":"We describe our experiences in participating in the scored task for the 2017 FEIII Data Challenge. Our approach is to model the problem as a binary classification problem and train an ensemble model leveraging domain features that capture financial terminology. We share challenge results for our submission, which performed well achieving the highest score in four out of six evaluation criteria. We describe semantic complexities encountered with regards to the task definition and ambiguities in the labeled dataset. We present an alternative task formulation Relationship Validation that addresses some of these semantic complexities and demonstrate how our approach naturally extends to this simplified task definition.","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121912616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Entity relationship ranking using differential keyword-role affinity","authors":"Rohit Naini, Pawan Yadav","doi":"10.1145/3077240.3077255","DOIUrl":"https://doi.org/10.1145/3077240.3077255","url":null,"abstract":"Identifying relationship between named entities from a corpus of text is a well studied NLP problem. In this paper, we consider a tractable version of this wherein sample text snippets and corresponding roles are extracted and need to be ranked on relevance of text to the role. Our scoring approach uses a cumulative estimated relevance of all keywords observed in the text snippet. Relevance metrics are computed based on differential affinity of keywords to the roles observed in the training data.","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122153578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Financial Relationships Using Probabilistic Topic Models (Demonstration Paper)","authors":"L. Raschid, Zheng Xu, Elena Zotkina","doi":"10.1145/3077240.3077247","DOIUrl":"https://doi.org/10.1145/3077240.3077247","url":null,"abstract":"Understanding relationships among financial entities can provide insight into the behavior of complex financial eco-systems. In this demonstration paper, we consider datasets of financial documents that describe the activity or role played by a financial institution (FI), typically with respect to a financial product or another financial entity. We develop community models based on financial institutions (FI) and their behavior or activity described by their roles (Role). Our models are based on an intuitive assumption that FIs will form communities, and FIs within a community are more likely to collaborate with other FIs in that community, and to play the same role, in other communities. Inspired by the Latent Dirichlet Allocation (LDA) and topic models, we develop several probabilistic financial community models and we use those models to identify interesting financial communities in two datasets.","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130348652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Roman, B. Ulicny, Yi Du, Srijith Poduval, A. Ko
{"title":"Thomson Reuters' Submission to the FEIII 2017 Challenge Non-scored Tasks","authors":"E. Roman, B. Ulicny, Yi Du, Srijith Poduval, A. Ko","doi":"10.1145/3077240.3077244","DOIUrl":"https://doi.org/10.1145/3077240.3077244","url":null,"abstract":"In this paper we describe a machine learning approach to predict roles of extracted SEC triples for the non-scored task of the 2017 FEIII Challenge. In addition, we describe a graph and data analysis derived from SEC triples.","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134139245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eric Heiden, Gerard Hoberg, Craig A. Knoblock, Palak Modi, G. Phillips, Gaurangi Raul, Pedro A. Szekely
{"title":"Web Text-based Network Industry Classifications: Preliminary Results","authors":"Eric Heiden, Gerard Hoberg, Craig A. Knoblock, Palak Modi, G. Phillips, Gaurangi Raul, Pedro A. Szekely","doi":"10.1145/3077240.3077245","DOIUrl":"https://doi.org/10.1145/3077240.3077245","url":null,"abstract":"Studies of market structure and product market competition are important in many disciplines, such as economics, finance, accounting and management. Reliable data for such studies is easily available for public firms (e.g., 10-K filings), but no reliable data exists for private firms. In this work we propose to mine the Internet Archive Wayback Machine, a digital archive of the World Wide Web, to build a database of 300,000 companies to support analyses of market structure, product market competition, and innovation. The goal of the WTNIC project is to download pages from the archive to build a profile for each company, and to use machine learning techniques to define similarity between companies based on similarity of their product and service offerings. This paper describes the challenges that must be overcome, our approach to overcome these challenges, and some preliminary results.","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132195755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}