{"title":"Understanding Trading Interactions and Behavior in Over-the-Counter Markets","authors":"Chi-hung Chen, L. Raschid, Jinming Xue","doi":"10.1145/3336499.3338004","DOIUrl":"https://doi.org/10.1145/3336499.3338004","url":null,"abstract":"This research applies machine learning methods, in particular probabilistic topic modeling, to understand patterns of interactions for Over-the-Counter (OTC) trading in corporate bonds. The interactions are between broker-dealers (dealers) and clients, or between dealers. From reports of dealer transactions, we create documents representing the daily activity of each dealer. This includes four types of dealer activities: Buy from / Sell to a client, and Buy from / Sell to another dealer. We use Latent Dirichlet Allocation (LDA) based topic models to identify communities of bonds that are bought or sold (co-traded) on the same day. Some communities reflect an industry sector, while others have a concentration of specific bonds. Several topics temporally align to notable financial events. We group dealers around topics to understand their interactions with clients and other dealers. We observe a range of interaction patterns that merit further study, including the centrality of some dealer(s) to some topics. This research illustrates that topic modeling / community detection can indeed provide insight into dealer behavior for OTC trades.","PeriodicalId":148424,"journal":{"name":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129900730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Davide Azzalini, Fabio Azzalini, Mirjana Mazuran, L. Tanca
{"title":"Tracking the Evolution of Financial Time Series Clusters","authors":"Davide Azzalini, Fabio Azzalini, Mirjana Mazuran, L. Tanca","doi":"10.1145/3336499.3338006","DOIUrl":"https://doi.org/10.1145/3336499.3338006","url":null,"abstract":"Nowadays, a huge amount of applications exist that natively adopt a data-streaming model to represent highly dynamic phenomena. A challenging application is constituted by data from the stock market, where the stock prices are naturally modeled as data streams that fluctuate very much and remain meaningful only for short amounts of time. In this paper we present a technique to track evolving clusters of financial time series, with the aim of constructing reliable models for this highly dynamic application. In our technique the clustering over a set of time series is iterated over time through sliding windows and, at each iteration, the differences between the current clustering and the previous one are studied to determine those changes that are \"significant\" with respect to the application. For example, in the financial domain, if a company that has belonged to the same cluster for a certain amount of time moves to another cluster, this may be a signal of a significant change in its economical or financial situation.","PeriodicalId":148424,"journal":{"name":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132480652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Maurino, A. Rula, Bjørn Marius von Zernichow, Mauricio Soto Gomez, B. Elvesæter, D. Roman
{"title":"Modelling and Linking Company Data in the euBusinessGraph Platform","authors":"A. Maurino, A. Rula, Bjørn Marius von Zernichow, Mauricio Soto Gomez, B. Elvesæter, D. Roman","doi":"10.1145/3336499.3338012","DOIUrl":"https://doi.org/10.1145/3336499.3338012","url":null,"abstract":"In the business environment, knowledge of company data is essential for a variety of tasks. The European funded project euBusinessGraph enables the establishment of a company data platform where data providers and consumers can publish and access company data. The core of the platform is the semantic data model that is the conceptual representation of company data in a common way so that it is easier to share and interlink company data. In this paper we show how the unified model and Grafterizer, a tool for manipulating and transforming raw data into Linked Data, support the linking challenge proposed in FEIII 2019. Results show that geographical enrichment of RDF data supports the interlinking process between company entities in different datasets.","PeriodicalId":148424,"journal":{"name":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133885488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Qian, D. Burdick, Sairam Gurajada, Lucian Popa
{"title":"Learning Explainable Entity Resolution Algorithms for Small Business Data using SystemER","authors":"Kun Qian, D. Burdick, Sairam Gurajada, Lucian Popa","doi":"10.1145/3336499.3338010","DOIUrl":"https://doi.org/10.1145/3336499.3338010","url":null,"abstract":"The 2019 FEIII CALI data challenge aims at linking different representations of the same real-world entities across multiple public datasets that collect identification and activity data about small to medium enterprises (SMEs) in California. We formalize this challenge as a learning-based entity resolution (ER) task, the goal of which is to learn a high-precision and high-recall pair-wise ER model that classifies small business entity pairs into matches and non-matches. Realistic ER tasks usually involve a pipeline of laborintensive and error-prone tasks, such as data preprocesing, gathering of training data, feature engineering, and model tuning. In this task, we apply an advanced human-in-the-loop system, named SystemER, to learn ER algorithms for SME entities. Powered by active learning and via a carefully designed user interface, SystemER can learn high-quality explainable ER algorithms with low human effort, while achieving high-accuracy on the datasets provided by the FEIII CALI data challenge.","PeriodicalId":148424,"journal":{"name":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131243681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe Attanasio, Luca Cagliero, P. Garza, Elena Baralis
{"title":"Quantitative cryptocurrency trading: exploring the use of machine learning techniques","authors":"Giuseppe Attanasio, Luca Cagliero, P. Garza, Elena Baralis","doi":"10.1145/3336499.3338003","DOIUrl":"https://doi.org/10.1145/3336499.3338003","url":null,"abstract":"Machine learning techniques have found application in the study and development of quantitative trading systems. These systems usually exploit supervised models trained on historical data in order to automatically generate buy/sell signals on the financial markets. Although in this context a deep exploration of the Stock, Forex, and Future exchange markets has already been made, a more limited effort has been devoted to the application of machine learning techniques to the emerging cryptocurrency exchange market. This paper explores the potential of the most established classification and time series forecasting models in cryptocurrency trading by backtesting model performance over a eight year period. The results show that, due to the heterogeneity and volatility of the underlying financial instruments, prediction models based on series forecasting perform better than classification techniques. Furthermore, trading multiple cryptocurrencies at the same time significantly increases the overall returns compared to baseline strategies exclusively based on Bitcoin trading.","PeriodicalId":148424,"journal":{"name":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121943310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Raschid, D. Burdick, C. D. Pablo, M. Flood, John Grant, J. Langsam, J. Pujara, Elena Tomas, I. Soboroff
{"title":"Financial Entity Identification and Information Integration (FEIII) 2019 Challenge: The Report of the Organizing Committee","authors":"L. Raschid, D. Burdick, C. D. Pablo, M. Flood, John Grant, J. Langsam, J. Pujara, Elena Tomas, I. Soboroff","doi":"10.1145/3336499.3338014","DOIUrl":"https://doi.org/10.1145/3336499.3338014","url":null,"abstract":"This report presents the goals and outcomes of the 2019 Financial Entity Identification and Information Integration (FEIII) Challenge. We describe two challenge datasets and tasks. FEIII SHIP was a bill of lading dataset for incoming shipments to the United States and the task was to identify the major shippers for some product, from some country. FEIII CALI included state and local regulatory data from California, and the task was entity linkage for San Francisco restaurants. The report summarizes plans for the 2020 Challenge and the Business Open Knowledge Network (BOKN).","PeriodicalId":148424,"journal":{"name":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125515059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nearest Neighbor Search for Unsupervised Entity Linkage","authors":"Qingchen Wang","doi":"10.1145/3336499.3338011","DOIUrl":"https://doi.org/10.1145/3336499.3338011","url":null,"abstract":"In this FEIII challenge competitors are tasked with matching small businesses in California across a number of data collections containing corporate registrations, sales tax permits, liquor licenses and restaurant inspections. We present match results using a nearest neighbor search via text-based similarity metrics on the businesses' primary names and street addresses. Manual inspection of the quality of matches shows promising matches.","PeriodicalId":148424,"journal":{"name":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128804522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Online Semi-NMF Algorithm for Soft-Clustering of Financial Institutions","authors":"Yuan Cheng, Shawn Mankad","doi":"10.1145/3336499.3338005","DOIUrl":"https://doi.org/10.1145/3336499.3338005","url":null,"abstract":"In this paper we develop and propose an online semi-non-negative matrix factorization framework to cluster firms by their stock returns. The model is motivated by an accounting balance sheet identity, where one of the estimated matrix factors can be seen as the percentage of holdings across different asset classes (stocks, bonds, etc.) for each firm -- an important input for risk analysis. We also show that our model is an extension of soft K-means clustering. To enhance the practical value of the proposed model (OSNMF), we also develop a fast estimation framework that can be readily applied to cluster firms in real-time as new data becomes available. The model is validated using synthetic and real data. Specifically, we apply our technique to recover asset holdings of mutual funds and ETFs from stock returns and show our estimates closely match their disclosed balance sheets.","PeriodicalId":148424,"journal":{"name":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125768577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shipment Supplier Inference Using Topic Modeling","authors":"Chi-hung Chen","doi":"10.1145/3336499.3338013","DOIUrl":"https://doi.org/10.1145/3336499.3338013","url":null,"abstract":"This research applies Latent Dirichlet Allocation on United States Automated Manifest System Bill of Lading data. We define a \"bag of word\" where each Harmonized tariff code represents a document, each shipper name be a token and count of shipments to be element of matrix. The result shows that topic model is able to classify some shippers of the same industries.","PeriodicalId":148424,"journal":{"name":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122049175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extensible and Scalable Entity Resolution for Financial Datasets Using RLTK","authors":"Yixiang Yao, Pedro A. Szekely, J. Pujara","doi":"10.1145/3336499.3338008","DOIUrl":"https://doi.org/10.1145/3336499.3338008","url":null,"abstract":"","PeriodicalId":148424,"journal":{"name":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114073750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}