J. Lockhart, Samuel A. Assefa, Ayham Alajdad, Andrew Alexander, T. Balch, M. Veloso
{"title":"SURF: improving classifiers in production by learning from busy and noisy end users","authors":"J. Lockhart, Samuel A. Assefa, Ayham Alajdad, Andrew Alexander, T. Balch, M. Veloso","doi":"10.1145/3383455.3422547","DOIUrl":"https://doi.org/10.1145/3383455.3422547","url":null,"abstract":"Supervised learning classifiers inevitably make mistakes in production, perhaps mis-labeling an email, or flagging an otherwise routine transaction as fraudulent. It is vital that the end users of such a system are provided with a means of relabeling data points that they deem to have been mislabeled. The classifier can then be retrained on the relabeled data points in the hope of performance improvement. To reduce noise in this feedback data, well known algorithms from the crowdsourcing literature can be employed. However, the feedback setting provides a new challenge: how do we know what to do in the case of user non-response? If a user provides us with no feedback on a label then it can be dangerous to assume they implicitly agree: a user can be busy, lazy, or no longer a user of the system! We show that conventional crowdsourcing algorithms struggle in this user feedback setting, and present a new algorithm, SURF, that can cope with this non-response ambiguity.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121246819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leo de Castro, Jiahao Chen, Antigoni Polychroniadou
{"title":"CryptoCredit: securely training fair models","authors":"Leo de Castro, Jiahao Chen, Antigoni Polychroniadou","doi":"10.1145/3383455.3422567","DOIUrl":"https://doi.org/10.1145/3383455.3422567","url":null,"abstract":"When developing models for regulated decision making, sensitive features like age, race and gender cannot be used and must be obscured from model developers to prevent bias. However, the remaining features still need to be tested for correlation with sensitive features, which can only be done with the knowledge of those features. We resolve this dilemma using a fully homomorphic encryption scheme, allowing model developers to train linear regression and logistic regression models and test them for possible bias without ever revealing the sensitive features in the clear. We demonstrate how it can be applied to leave-one-out regression testing, and show using the adult income data set that our method is practical to run.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127288251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards self-regulating AI: challenges and opportunities of AI model governance in financial services","authors":"Eren Kurshan, Hongda Shen, Jiahao Chen","doi":"10.1145/3383455.3422564","DOIUrl":"https://doi.org/10.1145/3383455.3422564","url":null,"abstract":"AI systems have found a wide range of application areas in financial services. Their involvement in broader and increasingly critical decisions has escalated the need for compliance and effective model governance. Current governance practices have evolved from more traditional financial applications and modeling frameworks. They often struggle with the fundamental differences in AI characteristics such as uncertainty in the assumptions, and the lack of explicit programming. AI model governance frequently involves complex review flows and relies heavily on manual steps. As a result, it faces serious challenges in effectiveness, cost, complexity, and speed. Furthermore, the unprecedented rate of growth in the AI model complexity raises questions on the sustainability of the current practices. This paper focuses on the challenges of AI model governance in the financial services industry. As a part of the outlook, we present a system-level framework towards increased self-regulation for robustness and compliance. This approach aims to enable potential solution opportunities through increased automation and the integration of monitoring, management, and mitigation capabilities. The proposed framework also provides model governance and risk management improved capabilities to manage model risk during deployment.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114524567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Paying down metadata debt: learning the representation of concepts using topic models","authors":"Jiahao Chen, M. Veloso","doi":"10.1145/3383455.3422537","DOIUrl":"https://doi.org/10.1145/3383455.3422537","url":null,"abstract":"We introduce a data management problem called metadata debt, to identify the mapping between data concepts and their logical representations. We describe how this mapping can be learned using semisupervised topic models based on low-rank matrix factorizations that account for missing and noisy labels, coupled with sparsity penalties to improve localization and interpretability. We introduce a gauge transformation approach that allows us to construct explicit associations between topics and concept labels, and thus assign meaning to topics. We also show how to use this topic model for semisupervised learning tasks like extrapolating from known labels, evaluating possible errors in existing labels, and predicting missing features. We show results from this topic model in predicting subject tags on over 25,000 datasets from Kaggle.com, demonstrating the ability to learn semantically meaningful features.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115409041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Glasserman, K. Krstovski, Paul Laliberte, Harry Mamaysky
{"title":"Choosing news topics to explain stock market returns","authors":"P. Glasserman, K. Krstovski, Paul Laliberte, Harry Mamaysky","doi":"10.1145/3383455.3422557","DOIUrl":"https://doi.org/10.1145/3383455.3422557","url":null,"abstract":"We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling in a stochastic EM algorithm will often overfit returns to the detriment of the topic model. We obtain better out-of-sample performance through a random search of plain LDA models. A branching procedure that reinforces effective topic assignments often performs best. We test these methods on an archive of over 90,000 news articles about S&P 500 firms.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123879690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongyang Yang, Xiao-Yang Liu, Shanli Zhong, A. Walid
{"title":"Deep reinforcement learning for automated stock trading: an ensemble strategy","authors":"Hongyang Yang, Xiao-Yang Liu, Shanli Zhong, A. Walid","doi":"10.1145/3383455.3422540","DOIUrl":"https://doi.org/10.1145/3383455.3422540","url":null,"abstract":"Stock trading strategies play a critical role in investment. However, it is challenging to design a profitable strategy in a complex and dynamic stock market. In this paper, we propose an ensemble strategy that employs deep reinforcement schemes to learn a stock trading strategy by maximizing investment return. We train a deep reinforcement learning agent and obtain an ensemble trading strategy using three actor-critic based algorithms: Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Deep Deterministic Policy Gradient (DDPG). The ensemble strategy inherits and integrates the best features of the three algorithms, thereby robustly adjusting to different market situations. In order to avoid the large memory consumption in training networks with continuous action space, we employ a load-on-demand technique for processing very large data. We test our algorithms on the 30 Dow Jones stocks that have adequate liquidity. The performance of the trading agent with different reinforcement learning algorithms is evaluated and compared with both the Dow Jones Industrial Average index and the traditional min-variance portfolio allocation strategy. The proposed deep ensemble strategy is shown to outperform the three individual algorithms and two baselines in terms of the risk-adjusted return measured by the Sharpe ratio.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131916028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonia Gogoglou, Brian Nguyen, A. Salimov, Jonathan Rider, C. B. Bruss
{"title":"Navigating the dynamics of financial embeddings over time","authors":"Antonia Gogoglou, Brian Nguyen, A. Salimov, Jonathan Rider, C. B. Bruss","doi":"10.1145/3383455.3422552","DOIUrl":"https://doi.org/10.1145/3383455.3422552","url":null,"abstract":"Financial transactions constitute connections between entities and through these connections a large scale heterogeneous weighted graph is formulated. In this labyrinth of interactions that are continuously updated, there exists a variety of similarity-based patterns that can provide insights into the dynamics of the financial system. With the current work, we propose the application of Graph Representation Learning in a scalable dynamic setting as a means of capturing these patterns in a meaningful and robust way. We proceed to perform a rigorous qualitative analysis of the latent trajectories to extract real world insights from the proposed representations and their evolution over time that is to our knowledge the first of its kind in the financial sector. Shifts in the latent space are associated with known economic events and in particular the impact of the recent Covid-19 pandemic to consumer patterns. Capturing such patterns indicates the value added to financial modeling through the incorporation of latent graph representations.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"1988 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131090374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Risk-sensitive reinforcement learning: a martingale approach to reward uncertainty","authors":"N. Vadori, Sumitra Ganesh, P. Reddy, M. Veloso","doi":"10.1145/3383455.3422519","DOIUrl":"https://doi.org/10.1145/3383455.3422519","url":null,"abstract":"We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually more meaningful in some cases. To this end, we present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a stochastic process, and introduce a new conceptual tool - the chaotic variation - which can rigorously be interpreted as the risk measure of the martingale component associated to the cumulative reward process. We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy gradient and value function based, and illustrate its relevance on grid world and portfolio optimization problems.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114877062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating synthetic data in finance: opportunities, challenges and pitfalls","authors":"Samuel A. Assefa","doi":"10.1145/3383455.3422554","DOIUrl":"https://doi.org/10.1145/3383455.3422554","url":null,"abstract":"Financial services generate a huge volume of data that is extremely complex and varied. These datasets are often stored in silos within organisations for various reasons, including but not limited to regulatory requirements and business needs. As a result, data sharing within different lines of business as well as outside of the organisation (e.g. to the research community) is severely limited. It is therefore critical to investigate methods for synthesising financial datasets that follow the same properties of the real data while respecting the need for privacy of the parties involved. This introductory paper aims to highlight the growing need for effective synthetic data generation in the financial domain. We highlight three main areas of focus that are of particular importance while generating synthetic financial datasets: 1) Generating realistic synthetic datasets. 2) Measuring the similarities between real and generated datasets. 3) Ensuring the generative process satisfies any privacy constraints. Although these challenges are also present in other domains, the additional regulatory and privacy requirements within financial services present unique questions that are not asked elsewhere. Due to the size and influence of the financial services industry, answering these questions has the potential for a great and lasting impact. Finally, we aim to develop a shared vocabulary and context for generating synthetic financial data using two types of financial datasets as examples.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121473776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-agent reinforcement learning in a realistic limit order book market simulation","authors":"Michael Karpe, Jin Fang, Zhongyao Ma, Chen Wang","doi":"10.1145/3383455.3422570","DOIUrl":"https://doi.org/10.1145/3383455.3422570","url":null,"abstract":"Optimal order execution is widely studied by industry practitioners and academic researchers because it determines the profitability of investment decisions and high-level trading strategies, particularly those involving large volumes of orders. However, complex and unknown market dynamics pose significant challenges for the development and validation of optimal execution strategies. In this paper, we propose a model-free approach by training Reinforcement Learning (RL) agents in a realistic market simulation environment with multiple agents. First, we configure a multi-agent historical order book simulation environment for execution tasks built on an Agent-Based Interactive Discrete Event Simulation (ABIDES) [6]. Second, we formulate the problem of optimal execution in an RL setting where an intelligent agent can make order execution and placement decisions based on market microstructure trading signals in High Frequency Trading (HFT). Third, we develop and train an RL execution agent using the Double Deep Q-Learning (DDQL) algorithm in the ABIDES environment. In some scenarios, our RL agent converges towards a Time-Weighted Average Price (TWAP) strategy. Finally, we evaluate the simulation with our RL agent by comparing it with a market replay simulation using real market Limit Order Book (LOB) data.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127616651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}