Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets最新文献

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets 第四届金融和经济数据集宏观建模数据科学国际研讨会论文集

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets Pub Date : 2018-06-15 DOI: 10.1145/3220547

引用次数: 0

Hybrid Link Prediction for Competitor Relationships 竞争对手关系的混合链路预测

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets Pub Date : 2018-06-15 DOI: 10.1145/3220547.3220559

J. Pujara

{"title":"Hybrid Link Prediction for Competitor Relationships","authors":"J. Pujara","doi":"10.1145/3220547.3220559","DOIUrl":"https://doi.org/10.1145/3220547.3220559","url":null,"abstract":"Competitor relationships are integral to many important financial applications. Example use cases include understanding regulatory impacts, investing in new business areas, and building economic models. Competitor relationships can be defined based on several aspects, including valuations and asset returns, industrial processes, or offerings of products and services. Determining these relationships is often challenging due to the diverse and complex interactions between companies which must be mined from vast datasets with varying degrees of credibility. In this paper, we approach this problem by constructing a hybrid knowledge graph capturing financial relationships and applying a link prediction model to identify missing competitor relationships. Knowledge graphs are a popular knowledge representation choice for capturing entities and the relationships between them. Knowledge graph construction typically uses only a single type of input data, such as relationships mined from text using information extraction techniques or curated relationships from relational databases. In contrast, for the FEIII Challenge, we are provided with several sources of relationships from different types of input, including expert judgments, mined relationships, and statistical features. Our approach creates a hybrid knowledge graph that includes relationships derived from three very different types of data in a single knowledge graph. We construct a hybrid knowledge graph using data provided for the FEIII Challenge and one additional source, the webpages of companies included in the challenge. The first data source we use are expert judgments curated by the Thomson Reuters Data Fusion (TRDF) platform. The second data source we are provided in the challenge are relationships extracted from text found in SEC filings. Finally, we introduce a third set of statistical signals, derived primarily from collecting webpages of the companies in the","PeriodicalId":161670,"journal":{"name":"Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133362910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The Challenges of Creating, Maintaining and Exploring Graphs of Financial Entities 创建、维护和探索金融实体图的挑战

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets Pub Date : 2018-06-15 DOI: 10.1145/3220547.3220553

M. Loster, Tim Repke, Ralf Krestel, Felix Naumann, Jan Ehmueller, Benjamin Feldmann, Oliver Maspfuhl

{"title":"The Challenges of Creating, Maintaining and Exploring Graphs of Financial Entities","authors":"M. Loster, Tim Repke, Ralf Krestel, Felix Naumann, Jan Ehmueller, Benjamin Feldmann, Oliver Maspfuhl","doi":"10.1145/3220547.3220553","DOIUrl":"https://doi.org/10.1145/3220547.3220553","url":null,"abstract":"1 OVERVIEW & MOTIVATION The integration of a wide range of structured and unstructured information sources into a uniformly integrated knowledge base is an important task in the nancial sector. As an example, modern risk analysis methods can bene t greatly from an integrated knowledge base, building in particular a dedicated, domain-speci c knowledge graph. Knowledge graphs can be used to gain a holistic view of the current economic situation so that systemic risks can be identi ed early enough to react appropriately. The use of this graphical structure thus allows the investigation of many nancial scenarios, such as the impact of corporate bankruptcy on other market participants within the network. In this particular scenario, the links between the individual market participants can be used to determine which companies are a ected by a bankruptcy and to what extent. We took these considerations as a motivation to start the development of a system capable of constructing and maintaining a knowledge graph of nancial entities and their relationships. The envisioned system generates this particular graph by extracting and combining information from both structured data sources such as Wikidata and DBpedia, as well as from unstructured data sources such as newspaper articles and nancial lings. In addition, the system should incorporate proprietary data sources, such as nancial transactions (structured) and credit reports (unstructured). The ultimate goal is to create a system that recognizes nancial entities in structured and unstructured sources, links them with the information of a knowledge base, and then extracts the relations expressed in the text between the identi ed entities. The constructed knowledge base can be used to construct the desired knowledge graph. Our system design consists of several components, each of which addresses a speci c subproblem. To this end, Figure 1 gives a general overview of our system and its subcomponents.","PeriodicalId":161670,"journal":{"name":"Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123982416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Analysis of year-over-year changes in Risk Factors Disclosure in 10-K filings 10-K文件中风险因素披露的年度变化分析

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets Pub Date : 2018-06-15 DOI: 10.1145/3220547.3220555

Vipula Rawte, Aparna Gupta, Mohammed J. Zaki

{"title":"Analysis of year-over-year changes in Risk Factors Disclosure in 10-K filings","authors":"Vipula Rawte, Aparna Gupta, Mohammed J. Zaki","doi":"10.1145/3220547.3220555","DOIUrl":"https://doi.org/10.1145/3220547.3220555","url":null,"abstract":"Risk Factor Disclosures -- Item 1A -- in 10-K forms filed with SEC is one of the important sections since it contains a company's yearly risk updates, and thus helps investors decide whether to invest in a company or not. It is crucial to read this section carefully in order to make better investment choices. Given the large number of such forms filed on a yearly basis, it is very cumbersome for humans to understand and analyze them to make informed decisions. We discuss the task of bank failure classification using textual analysis on item 1A for various banks' 10-K forms, i.e., to predict whether a bank will fail or not. We also analyze other quantitative bank performance indicators like leverage and Return On Assets (ROA), and see how well text-based methods can predict those risk indicators. In particular, to create our textual corpora, we focus on the changes in the 1A sections, retaining only those sentences that have under 30% and 40% similarity over two consecutive years (for the same bank). We implement deep learning and other supervised learning techniques like Convolutional Neural Networks (CNN), Support Vector Machines (SVM) and Linear Regression. We also combine the word sentiment polarities along with their count as our weighted feature vector.","PeriodicalId":161670,"journal":{"name":"Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets","volume":"10 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116825083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Feature Selection Methods For Understanding Business Competitor Relationships 了解商业竞争对手关系的特征选择方法

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets Pub Date : 2018-06-15 DOI: 10.1145/3220547.3220550

Rahul Gupta, J. Pujara, Craig A. Knoblock, S. Sharanappa, Bharat Pulavarti, Gerard Hoberg, G. Phillips

{"title":"Feature Selection Methods For Understanding Business Competitor Relationships","authors":"Rahul Gupta, J. Pujara, Craig A. Knoblock, S. Sharanappa, Bharat Pulavarti, Gerard Hoberg, G. Phillips","doi":"10.1145/3220547.3220550","DOIUrl":"https://doi.org/10.1145/3220547.3220550","url":null,"abstract":"Understanding competition between businesses is essential for assessing the likely success of new ventures or products, for making decisions before investing capital in new businesses, and understanding the impacts of regulatory policy. One important resource for analyzing competitor relationships are business webpages, which can capture the mission, products, services, and key markets associated with a company. However, webpages also contain irrelevant, extraneous, or misleading text, hampering prediction. To address this challenge, predictive models use a process known as feature selection to identify only relevant terms. The diversity and specificity of business domains pose a challenge for automated approaches for feature selection. In this paper, we compare two approaches to feature selection: manually-curated lists of terms provided by experts and automated approaches to feature selection. We evaluate several approaches to feature selection and their impact on predicting competitor relationships, demonstrating that carefully designed automated feature selection approaches can surpass the performance of manually-curated word lists by 10%.","PeriodicalId":161670,"journal":{"name":"Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127255824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Predicting competitor links in company networks 预测公司网络中的竞争对手链接

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets Pub Date : 2018-06-15 DOI: 10.1145/3220547.3220558

César de Pablo-Sánchez, E. T. Herruzo, Alberto Rubio

引用次数: 0

An Ontology of Ownership and Control Relations for Bank Holding Companies 银行控股公司所有权与控制权关系本体

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets Pub Date : 2018-06-15 DOI: 10.1145/3220547.3220551

Liju Fan, M. Flood

{"title":"An Ontology of Ownership and Control Relations for Bank Holding Companies","authors":"Liju Fan, M. Flood","doi":"10.1145/3220547.3220551","DOIUrl":"https://doi.org/10.1145/3220547.3220551","url":null,"abstract":"We consider the challenges and benefits of ontologies for information management for regulatory reporting from bank holding companies (BHCs). Many BHCs, especially the largest and most complex firms, have multiple federal supervisors who oversee a diverse array of subsidiaries. This creates a federated data management problem that disperses information across many firms and regulators. We prototype an ontology for the Federal Reserve's public National Information Center (NIC) database. The NIC identifies all BHCs, their subsidiaries, and the ownership and control relationships among them. It is a basic official source on the structure of the industry. A formal ontology can capture this expert-curated knowledge in a coherent, structured format. This could assure data integrity and enable non-experts to more readily integrate and analyze data about complex organizations. We test the design and development of federated prototype ontologies in OWL/RDF to provide and integrate the NIC data with precise semantics for transparency and consistency. Our preliminary results indicate that this is feasible in practice for data search and analysis, and that the ontologies can facilitate semantic integration and improve the integrity of data and metadata.","PeriodicalId":161670,"journal":{"name":"Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133400314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Financial Entity Identification and Information Integration (FEIII) 2018 Challenge: The Report of the Organizing Committee 金融实体识别与信息集成(FEIII) 2018挑战赛:组委会报告

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets Pub Date : 2018-06-15 DOI: 10.1145/3220547.3225218

L. Raschid, D. Burdick, John Grant, J. Langsam, J. Pujara, E. Roman, I. Soboroff, Mohammed J. Zaki, Elena Zotkina

引用次数: 0

Defining and Capturing the Competitor Relationship across Financial Datasets 定义和捕获跨财务数据集的竞争对手关系

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets Pub Date : 2018-06-15 DOI: 10.1145/3220547.3220556

Min Li, D. Burdick, R. Krishnamurthy, Lucian Popa

{"title":"Defining and Capturing the Competitor Relationship across Financial Datasets","authors":"Min Li, D. Burdick, R. Krishnamurthy, Lucian Popa","doi":"10.1145/3220547.3220556","DOIUrl":"https://doi.org/10.1145/3220547.3220556","url":null,"abstract":"The 2018 FEIII Data Challenge aims to enhance a given knowledge graph by validating and enriching the set of competitor edges in the graph using multiple datasets. Upon an investigation of the data, we find that some of the competitor edges given as training data are inconsistent (e.g., conflicting with other relationships such as parent/subsidiary). Rather than using a machine learning approach that would have to address such difficulties and other ambiguities in the training data, we start by formulating two natural, semantic definitions of a competitor relationship. The first is a weak definition that is independent of the training data and identifies pairs of entities as potential competitors whenever in the same industry and geographical location, and provided that there is no negative evidence (such as the two entities being in the same family of companies). The second is a strong definition that intersects the pairs of entities obtained from the weak definition with the competitors given in the training dataset. These two definitions offer a framework implementation which can be extended to further utilize other attributes or additional information when available. One such extension that we can implement right away with the available data is to lift the competitor relationships from subsidiaries to their respective parent companies. We use a highlevel language (HIL) for entity linking to express and implement our two semantic definitions as well as the parent lifting extension. The resulting HIL algorithms are readable and easily extensible or modifiable by a domain expert. We show that our submission achieves 19.6% precision, 40.3% recall and 26.4% F1 score, and we make the case that with the availability of more data and more analytics these results can be further improved.","PeriodicalId":161670,"journal":{"name":"Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121639125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PREFER 更喜欢

Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets Pub Date : 2018-06-15 DOI: 10.1145/3220547.3220557

Ho-Yong Lee, Jongseon Park, Hyungjun Kim, H. Cho, Geonsoo Kim

引用次数: 1