Jochen Papenbrock, Peter Schwendner, Markus Jaeger, Stephan Krügel
{"title":"Matrix Evolutions: Synthetic Correlations and Explainable Machine Learning for Constructing Robust Investment Portfolios","authors":"Jochen Papenbrock, Peter Schwendner, Markus Jaeger, Stephan Krügel","doi":"10.2139/ssrn.3663220","DOIUrl":"https://doi.org/10.2139/ssrn.3663220","url":null,"abstract":"In this article, the authors present a novel and highly flexible concept to simulate correlation matrixes of financial markets. It produces realistic outcomes regarding stylized facts of empirical correlation matrixes and requires no asset return input data. The matrix generation is based on a multiobjective evolutionary algorithm, so the authors call the approach matrix evolutions. It is suitable for parallel implementation and can be accelerated by graphics processing units and quantum-inspired algorithms. The approach is useful for backtesting, pricing, and hedging correlation-dependent investment strategies and financial products. Its potential is demonstrated in a machine learning case study for robust portfolio construction in a multi-asset universe: An explainable machine learning program links the synthetic matrixes to the portfolio volatility spread of hierarchical risk parity versus equal risk contribution. TOPICS: Statistical methods, big data/machine learning, portfolio construction, performance measurement Key Findings ▪ The authors introduce the matrix evolutions concept based on an evolutionary algorithm to simulate correlation matrixes useful for financial market applications. ▪ They apply the resulting synthetic correlation matrixes to benchmark hierarchical risk parity (HRP) and equal risk contribution allocations of a multi-asset futures portfolio and find HRP to show lower portfolio risk. ▪ The authors evaluate three competing machine learning methods to regress the portfolio risk spread between both allocation methods against statistical features of the synthetic correlation matrixes and then discuss the local and global feature importance using the SHAP framework by Lundberg and Lee (2017).","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122840835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hyperparameter Optimization for Portfolio Selection","authors":"P. Nystrup, Erik Lindström, H. Madsen","doi":"10.3905/jfds.2020.1.035","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.035","url":null,"abstract":"Portfolio selection involves a trade-off between maximizing expected return and minimizing risk. In practice, useful formulations also include various costs and constraints that regularize the problem and reduce the risk due to estimation errors, resulting in solutions that depend on a number of hyperparameters. As the number of hyperparameters grows, selecting their value becomes increasingly important and difficult. In this article, the authors propose a systematic approach to hyperparameter optimization by leveraging recent advances in automated machine learning and multiobjective optimization. They optimize hyperparameters on a train set to yield the best result subject to market-determined realized costs. In applications to single- and multiperiod portfolio selection, they show that sequential hyperparameter optimization finds solutions with better risk–return trade-offs than manual, grid, and random search over hyperparameters using fewer function evaluations. At the same time, the solutions found are more stable from in-sample training to out-of-sample testing, suggesting they are less likely to be extremities that randomly happened to yield good performance in training. TOPICS: Portfolio theory, portfolio construction, big data/machine learning Key Findings • The growing number of applications of machine-learning approaches to portfolio selection means that hyperparameter optimization becomes increasingly important. We propose a systematic approach to hyperparameter optimization by leveraging recent advances in automated machine learning and multiobjective optimization. • We establish a connection between forecast uncertainty and holding- and trading-cost parameters in portfolio selection. We argue that they should be considered regularization parameters that can be adjusted in training to achieve optimal performance when tested subject to realized costs. • We show that multiobjective optimization can find solutions with better risk–return trade-offs than manual, grid, and random search over hyperparameters for portfolio selection. At the same time, the solutions are more stable across in-sample training and out-of-sample testing.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132841920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Cross Section of Commodity Returns: A Nonparametric Approach","authors":"C. Struck, Enoch Cheng","doi":"10.3905/jfds.2020.1.034","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.034","url":null,"abstract":"To what extent are financial market returns predictable? Standard approaches to asset pricing make strong parametric assumptions that undermine their return-predicting ability. The authors employ tree-based methods to overcome these limitations and attempt to approximate an upper bound for the predictability of returns in commodities futures markets. Out of sample, they find that up to 3.74% of 1-month returns are predictable—more than a 10-fold increase from standard approaches. The findings hint at the importance multiway interactions and nonlinearities acquire in the data; they imply that new factors should be tested on their ability to add explanatory power to an ensemble of existing factors. TOPICS: Futures and forward contracts, commodities Key Findings • Standard approaches to asset pricing make strong parametric assumptions that undermine their return-predicting ability. • The authors employ tree-based methods to overcome these limitations and estimate the predictability of returns in commodities futures markets. • Out of sample, they find that up to 3.74% of 1-month returns are predictable—more than a 10-fold increase from standard approaches.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130656238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Sequence Modeling: Development and Applications in Asset Pricing","authors":"Lingbo Cong, Ke Tang, Jingyuan Wang, Yang Zhang","doi":"10.3905/jfds.2020.1.053","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.053","url":null,"abstract":"The authors predict asset returns and measure risk premiums using a prominent technique from artificial intelligence: deep sequence modeling. Because asset returns often exhibit sequential dependence that may not be effectively captured by conventional time-series models, sequence modeling offers a promising path with its data-driven approach and superior performance. In this article, the authors first overview the development of deep sequence models, introduce their applications in asset pricing, and discuss their advantages and limitations. They then perform a comparative analysis of these methods using data on US equities. They demonstrate how sequence modeling benefits investors in general through incorporating complex historical path dependence and that long short-term memory–based models tend to have the best out-of-sample performance. TOPICS: Big data/machine learning, security analysis and valuation, performance measurement Key Findings ▪ This article provides a concise synopsis of deep sequence modeling with an emphasis on its historical development in the field of computer science and artificial intelligence. It serves as a reference source for social scientists who aim to use the tool to supplement conventional time-series and panel methods. ▪ Deep sequence models can be adapted successfully for asset pricing, especially in predicting asset returns, which allow the model to be flexible to capture the high-dimensionality, nonlinear, interactive, low signal-to-noise, and dynamic nature of financial data. In particular, the model’s ability to detect path-dependence patterns makes it versatile and effective, potentially outperforming existing models. ▪ This article provides a horse-race comparison of various deep sequence models for the tasks of forecasting returns and measuring risk premiums. Long short-term memory has the best performance in terms of out-of-sample predictive R2, and long short-term memory with an attention mechanism has the best portfolio performance when excluding microcap stocks.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134487544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Greedy Online Classification of Persistent Market States Using Realized Intraday Volatility Features","authors":"P. Nystrup, Petter N. Kolm, Erik Lindström","doi":"10.2139/ssrn.3594875","DOIUrl":"https://doi.org/10.2139/ssrn.3594875","url":null,"abstract":"In many financial applications, it is important to classify time-series data without any latency while maintaining persistence in the identified states. The authors propose a greedy online classifier that contemporaneously determines which hidden state a new observation belongs to without the need to parse historical observations and without compromising persistence. Their classifier is based on the idea of clustering temporal features while explicitly penalizing jumps between states by a fixed-cost regularization term that can be calibrated to achieve a desired level of persistence. Through a series of return simulations, the authors show that in most settings their new classifier remarkably obtains a higher accuracy than the correctly specified maximum likelihood estimator. They illustrate that the new classifier is more robust to misspecification and yields state sequences that are significantly more persistent both in and out of sample. They demonstrate how classification accuracy can be further improved by including features that are based on intraday data. Finally, the authors apply the new classifier to estimate persistent states of the S&P 500 Index. TOPICS: Statistical methods, simulations, big data/machine learning Key Findings • A new greedy online classifier is proposed that contemporaneously determines which hidden state a new observation belongs to without the need to parse historical observations and without compromising temporal persistence. • A series of simulations demonstrates that the new classifier frequently obtains a higher accuracy and is more robust to misspecification than the correctly specified maximum likelihood estimator. • Classification accuracy can be improved by including features that are based on intraday volatility data.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"42 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123471248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Portfolio Selection Using Portfolio Committees","authors":"Tsungwu Ho","doi":"10.2139/ssrn.3653595","DOIUrl":"https://doi.org/10.2139/ssrn.3653595","url":null,"abstract":"The author proposes a committee approach to portfolio selection. Because each optimal portfolio is a combination of three basic elements—strategy, covariance matrix, and risk type—the author first augments the combination to 250 optimal portfolios at each estimation period. The author then defines a score to select the best portfolio to hold in the next period. Survival of the fittest, the superior performance of the combination portfolio, demonstrates that the committee approach to portfolio selection is not only effective but also easy to implement. TOPICS: Portfolio theory, portfolio construction Key Findings • This article proposes a flexible and easy-to-implement committee approach to portfolio selection. • This article defines an algorithm that proposes a score to select the best portfolio out of 250 augmented portfolios. • In survival of the fittest, evidence from several datasets shows that the resulting combination portfolio overcomes the distributional uncertainty and exhibits superior annualized performance.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130406284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inside the Mind of Investors During the COVID-19 Pandemic: Evidence from the StockTwits Data","authors":"Hasan Fallahgoul","doi":"10.2139/ssrn.3583462","DOIUrl":"https://doi.org/10.2139/ssrn.3583462","url":null,"abstract":"The authors study investor beliefs—sentiment and disagreement—about stock market returns during the COVID-19 pandemic using a large number of investor messages, about 3.7 million, on a social media investing platform, StockTwits. The rich and multimodal features of StockTwits data allow the authors to explore the evolution of sentiment and disagreement within and across investors, sectors, and even industries. The authors find that sentiment (disagreement) has a sharp decrease (increase) across all investors with any investment philosophy, horizon, and experience between February 19, 2020, and March 23, 2020, when a historical market high was followed by a record drop. Surprisingly, these measures have a sharp reversal toward the end of March. However, the performance of these measures across various sectors is heterogeneous. Financial and healthcare sectors are the most pessimistic and optimistic divisions, respectively. TOPICS: Security analysis and valuation, quantitative methods, big data/machine learning, financial crises and financial market history, performance measurement Key Findings ▪ Daily time series of the sentiment and disagreement is not a stationary process. ▪ Sentiment (disagreement) has a sharp decrease (increase) across all investors with any investment philosophy, horizon, and experience between February 19, 2020, and March 23, 2020, when a historical market high was followed by a record drop. ▪ The financial and healthcare sectors are the most pessimistic and optimistic divisions, respectively.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125009825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"It’s All About Data: How to Make Good Decisions in a World Awash with Information","authors":"Mehrzad Mahdavi, Hossein Kazemi","doi":"10.3905/jfds.2020.1.025","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.025","url":null,"abstract":"The rise of big and alternative data has created significant new business opportunities in the financial sector. As we start on this journey of fast-moving technology disruption, financial professionals have a rare opportunity to balance the exponential growth of artificial intelligence (AI)/data science with ethics, bias, and privacy to create trusted data-driven decision making. In this article, the authors discuss the nuances of big data sets that are critical when one considers standards, processes, best practices, and modeling algorithms for the deployment of AI systems. In addition, this industry is widely guided by a fiduciary standard that puts the interests of the client above all else. It is therefore critical to have a thorough understanding of the limitations of our knowledge, because there are many known unknowns and unknown unknowns that can have a significant impact on outcomes. The authors emphasize key success factors for the deployment of AI initiatives: talent and bridging the skills gap. To achieve a lasting impact of big data initiatives, multidisciplinary teams with well-defined roles need to be established with continuing training and education. The prize is the finance of the future. TOPICS: Simulations, big data/machine learning Key Findings • The rise of alternative data in finance is creating major opportunities in all areas of the financial industry, including risk management, portfolio construction, investment banking, and insurance. • To build trusted outcomes in AI/ML initiatives, financial professionals’ roles are critical. Given the many nuances in using big data, there is a need for vetted protocols and methods in selecting data sets and algorithms. Best practices and guidelines are effective in reducing the risks of using AI/ML, including overfitting, lack of interpretability, biased inputs, and unethical use of data. • Given the major shortage of talent in AI/data science in finance, practical training of employees and continued education are keys to scale roll out to enable future of finance.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132745407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Avellaneda, Brian F. Healy, A. Papanicolaou, G. Papanicolaou
{"title":"PCA for Implied Volatility Surfaces","authors":"M. Avellaneda, Brian F. Healy, A. Papanicolaou, G. Papanicolaou","doi":"10.3905/jfds.2020.1.032","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.032","url":null,"abstract":"Principal component analysis (PCA) is a useful tool when trying to construct factor models from historical asset returns. For the implied volatilities of US equities, there is a PCA-based model with a principal eigenportfolio whose return time series lies close to that of an overarching market factor. The authors show that this market factor is the index resulting from the daily compounding of a weighted average of implied-volatility returns, with weights based on the options’ open interest and Vega. The authors also analyze the singular vectors derived from the tensor structure of the implied volatilities of S&P 500 constituents and find evidence indicating that some type of open interest- and Vega-weighted index should be one of at least two significant factors in this market. TOPICS: Statistical methods, simulations, big data/machine learning Key Findings • Principal component analysis of a comprehensive dataset of implied volatility surfaces from options on US equities shows that their collective behavior is captured by just nine factors, whereas the effective spatial dimension of the residuals is closer to 500 than to the nominal dimension of 28,000, revealing the large redundancy in the data. • Portfolios of implied volatility surface returns, weighed suitably by open interest and Vega, track the principal eigenportfolio associated with a market portfolio of options, in analogy to equity portfolios. • Retention of the tensor structure in the eigenportfolio analysis improves the tracking between the open interest–Vega weighted (tensor) implied volatility surface returns portfolio and the (tensor) eigenportfolio, indicating that data structure matters.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126511353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Managing Editor’s Letter","authors":"Francesco A. Fabozzi","doi":"10.3905/jfds.2020.2.1.001","DOIUrl":"https://doi.org/10.3905/jfds.2020.2.1.001","url":null,"abstract":"robert dunn General Manager The four issues of the 2019 inaugural publication of The Journal of Financial Data Science by all metrics indicate the success of the journal. Four of the articles published in JFDS were in the top 10 most downloaded articles across the Portfolio Management Research (PMR) platform. This is quite an accomplishment considering that JFDS represented just one year of articles. After publication of the first issue, articles in JFDS were featured in an opinion piece on the challenges of implementing machine learning by David Stevenson (“Machine Learning Revolution is Still Some Way Off”) published in the Financial Times. One of the articles in the inaugural issue is highlighted by Bill Kelly, the CEO of the CAIA Association, in an August 2019 blog (“Whatfore Art Thou Use of Alt-Data?”) in AllAboutAlpha. The Financial Data Professional Institute (FDPI), established by the CAIA Association, will be adopting at least f ive articles from JFDS as required reading for their membership exams. As researchers in this space produce papers, our expectation is that the journal will be well cited. In the first issue of Volume 2, there are nine articles which are summarized below. “Machine Learning in Asset Management—Part 1: Portfolio Construction—Trading Strategies” is the first in a series of articles by Derek Snow dealing with machine learning in asset management. The series will cover the applications to the major tasks of asset management: (1) portfolio construction, (2) risk management, (3) capital management, (4) infrastructure and deployment, and (5) sales and marketing. Portfolio construction is divided into trading and weight optimization. The primary focus of the current article is on how machine learning can be used to improve various types of trading strategies, while weight optimization is the subject of the next article in the series. Snow classifies trading strategies according to their respective machine-learning frameworks (i.e., reinforcement, supervised and unsupervised learning). He then explains the difference between reinforcement learning and supervised learning, both conceptually and in relation to their respective advantages and disadvantages. Global equity and bond asset management require techniques that also put effort into understanding the structure of the interactions. Network analysis offers asset managers insightful information regarding factor-based connectedness, relationships, and how risk is transferred between network components. Gueorgui Konstantinov and Mario Rusev demonstrate the relation between global equity and bond funds from a network perspective. In their article, “The Bond–Equity–Fund Relation Using the Fama–French–Carhart Factors: A Practical Network Approach,” they show the advantages of graph theory to explain the collective b y gu es t o n Fe br ua ry 5 , 2 02 1. C op yr ig ht 2 02 0 Pa ge an t M ed ia L td .","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132549573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}