Junshu Jiang, Jordan Richards, Raphaël Huser, David Bolin
{"title":"The Efficient Tail Hypothesis: An Extreme Value Perspective on Market Efficiency","authors":"Junshu Jiang, Jordan Richards, Raphaël Huser, David Bolin","doi":"arxiv-2408.06661","DOIUrl":"https://doi.org/arxiv-2408.06661","url":null,"abstract":"In econometrics, the Efficient Market Hypothesis posits that asset prices\u0000reflect all available information in the market. Several empirical\u0000investigations show that market efficiency drops when it undergoes extreme\u0000events. Many models for multivariate extremes focus on positive dependence,\u0000making them unsuitable for studying extremal dependence in financial markets\u0000where data often exhibit both positive and negative extremal dependence. To\u0000this end, we construct regular variation models on the entirety of\u0000$mathbb{R}^d$ and develop a bivariate measure for asymmetry in the strength of\u0000extremal dependence between adjacent orthants. Our directional tail dependence\u0000(DTD) measure allows us to define the Efficient Tail Hypothesis (ETH) -- an\u0000analogue of the Efficient Market Hypothesis -- for the extremal behaviour of\u0000the market. Asymptotic results for estimators of DTD are described, and we\u0000discuss testing of the ETH via permutation-based methods and present novel\u0000tools for visualization. Empirical study of China's futures market leads to a\u0000rejection of the ETH and we identify potential profitable investment\u0000opportunities. To promote the research of microstructure in China's derivatives\u0000market, we open-source our high-frequency data, which are being collected\u0000continuously from multiple derivative exchanges.","PeriodicalId":501139,"journal":{"name":"arXiv - QuantFin - Statistical Finance","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harnessing Earnings Reports for Stock Predictions: A QLoRA-Enhanced LLM Approach","authors":"Haowei Ni, Shuchen Meng, Xupeng Chen, Ziqing Zhao, Andi Chen, Panfeng Li, Shiyao Zhang, Qifu Yin, Yuanqing Wang, Yuxi Chan","doi":"arxiv-2408.06634","DOIUrl":"https://doi.org/arxiv-2408.06634","url":null,"abstract":"Accurate stock market predictions following earnings reports are crucial for\u0000investors. Traditional methods, particularly classical machine learning models,\u0000struggle with these predictions because they cannot effectively process and\u0000interpret extensive textual data contained in earnings reports and often\u0000overlook nuances that influence market movements. This paper introduces an\u0000advanced approach by employing Large Language Models (LLMs) instruction\u0000fine-tuned with a novel combination of instruction-based techniques and\u0000quantized low-rank adaptation (QLoRA) compression. Our methodology integrates\u0000'base factors', such as financial metric growth and earnings transcripts, with\u0000'external factors', including recent market indices performances and analyst\u0000grades, to create a rich, supervised dataset. This comprehensive dataset\u0000enables our models to achieve superior predictive performance in terms of\u0000accuracy, weighted F1, and Matthews correlation coefficient (MCC), especially\u0000evident in the comparison with benchmarks such as GPT-4. We specifically\u0000highlight the efficacy of the llama-3-8b-Instruct-4bit model, which showcases\u0000significant improvements over baseline models. The paper also discusses the\u0000potential of expanding the output capabilities to include a 'Hold' option and\u0000extending the prediction horizon, aiming to accommodate various investment\u0000styles and time frames. This study not only demonstrates the power of\u0000integrating cutting-edge AI with fine-tuned financial data but also paves the\u0000way for future research in enhancing AI-driven financial analysis tools.","PeriodicalId":501139,"journal":{"name":"arXiv - QuantFin - Statistical Finance","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large Investment Model","authors":"Jian Guo, Heung-Yeung Shum","doi":"arxiv-2408.10255","DOIUrl":"https://doi.org/arxiv-2408.10255","url":null,"abstract":"Traditional quantitative investment research is encountering diminishing\u0000returns alongside rising labor and time costs. To overcome these challenges, we\u0000introduce the Large Investment Model (LIM), a novel research paradigm designed\u0000to enhance both performance and efficiency at scale. LIM employs end-to-end\u0000learning and universal modeling to create an upstream foundation model capable\u0000of autonomously learning comprehensive signal patterns from diverse financial\u0000data spanning multiple exchanges, instruments, and frequencies. These \"global\u0000patterns\" are subsequently transferred to downstream strategy modeling,\u0000optimizing performance for specific tasks. We detail the system architecture\u0000design of LIM, address the technical challenges inherent in this approach, and\u0000outline potential directions for future research. The advantages of LIM are\u0000demonstrated through a series of numerical experiments on cross-instrument\u0000prediction for commodity futures trading, leveraging insights from stock\u0000markets.","PeriodicalId":501139,"journal":{"name":"arXiv - QuantFin - Statistical Finance","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A GCN-LSTM Approach for ES-mini and VX Futures Forecasting","authors":"Nikolas Michael, Mihai Cucuringu, Sam Howison","doi":"arxiv-2408.05659","DOIUrl":"https://doi.org/arxiv-2408.05659","url":null,"abstract":"We propose a novel data-driven network framework for forecasting problems\u0000related to E-mini S&P 500 and CBOE Volatility Index futures, in which products\u0000with different expirations act as distinct nodes. We provide visual\u0000demonstrations of the correlation structures of these products in terms of\u0000their returns, realized volatility, and trading volume. The resulting networks\u0000offer insights into the contemporaneous movements across the different\u0000products, illustrating how inherently connected the movements of the future\u0000products belonging to these two classes are. These networks are further\u0000utilized by a multi-channel Graph Convolutional Network to enhance the\u0000predictive power of a Long Short-Term Memory network, allowing for the\u0000propagation of forecasts of highly correlated quantities, combining the\u0000temporal with the spatial aspect of the term structure.","PeriodicalId":501139,"journal":{"name":"arXiv - QuantFin - Statistical Finance","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaoyue Tang, Karina Arias-Calluari, Michael S. Harré
{"title":"Comparative analysis of stationarity for Bitcoin and the S&P500","authors":"Yaoyue Tang, Karina Arias-Calluari, Michael S. Harré","doi":"arxiv-2408.02973","DOIUrl":"https://doi.org/arxiv-2408.02973","url":null,"abstract":"This paper compares and contrasts stationarity between the conventional stock\u0000market and cryptocurrency. The dataset used for the analysis is the intraday\u0000price indices of the S&P500 from 1996 to 2023 and the intraday Bitcoin indices\u0000from 2019 to 2023, both in USD. We adopt the definition of `wide sense\u0000stationary', which constrains the time independence of the first and second\u0000moments of a time series. The testing method used in this paper follows the\u0000Wiener-Khinchin Theorem, i.e., that for a wide sense stationary process, the\u0000power spectral density and the autocorrelation are a Fourier transform pair. We\u0000demonstrate that localized stationarity can be achieved by truncating the time\u0000series into segments, and for each segment, detrending and normalizing the\u0000price return are required. These results show that the S&P500 price return can\u0000achieve stationarity for the full 28-year period with a detrending window of 12\u0000months and a constrained normalization window of 10 minutes. With truncated\u0000segments, a larger normalization window can be used to establish stationarity,\u0000indicating that within the segment the data is more homogeneous. For Bitcoin\u0000price return, the segment with higher volatility presents stationarity with a\u0000normalization window of 60 minutes, whereas stationarity cannot be established\u0000in other segments.","PeriodicalId":501139,"journal":{"name":"arXiv - QuantFin - Statistical Finance","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NeuralFactors: A Novel Factor Learning Approach to Generative Modeling of Equities","authors":"Achintya Gopal","doi":"arxiv-2408.01499","DOIUrl":"https://doi.org/arxiv-2408.01499","url":null,"abstract":"The use of machine learning for statistical modeling (and thus, generative\u0000modeling) has grown in popularity with the proliferation of time series models,\u0000text-to-image models, and especially large language models. Fundamentally, the\u0000goal of classical factor modeling is statistical modeling of stock returns, and\u0000in this work, we explore using deep generative modeling to enhance classical\u0000factor models. Prior work has explored the use of deep generative models in\u0000order to model hundreds of stocks, leading to accurate risk forecasting and\u0000alpha portfolio construction; however, that specific model does not allow for\u0000easy factor modeling interpretation in that the factor exposures cannot be\u0000deduced. In this work, we introduce NeuralFactors, a novel machine-learning\u0000based approach to factor analysis where a neural network outputs factor\u0000exposures and factor returns, trained using the same methodology as variational\u0000autoencoders. We show that this model outperforms prior approaches both in\u0000terms of log-likelihood performance and computational efficiency. Further, we\u0000show that this method is competitive to prior work in generating realistic\u0000synthetic data, covariance estimation, risk analysis (e.g., value at risk, or\u0000VaR, of portfolios), and portfolio optimization. Finally, due to the connection\u0000to classical factor analysis, we analyze how the factors our model learns\u0000cluster together and show that the factor exposures could be used for embedding\u0000stocks.","PeriodicalId":501139,"journal":{"name":"arXiv - QuantFin - Statistical Finance","volume":"193 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NeuralBeta: Estimating Beta Using Deep Learning","authors":"Yuxin Liu, Jimin Lin, Achintya Gopal","doi":"arxiv-2408.01387","DOIUrl":"https://doi.org/arxiv-2408.01387","url":null,"abstract":"Traditional approaches to estimating beta in finance often involve rigid\u0000assumptions and fail to adequately capture beta dynamics, limiting their\u0000effectiveness in use cases like hedging. To address these limitations, we have\u0000developed a novel method using neural networks called NeuralBeta, which is\u0000capable of handling both univariate and multivariate scenarios and tracking the\u0000dynamic behavior of beta. To address the issue of interpretability, we\u0000introduce a new output layer inspired by regularized weighted linear\u0000regression, which provides transparency into the model's decision-making\u0000process. We conducted extensive experiments on both synthetic and market data,\u0000demonstrating NeuralBeta's superior performance compared to benchmark methods\u0000across various scenarios, especially instances where beta is highly\u0000time-varying, e.g., during regime shifts in the market. This model not only\u0000represents an advancement in the field of beta estimation, but also shows\u0000potential for applications in other financial contexts that assume linear\u0000relationships.","PeriodicalId":501139,"journal":{"name":"arXiv - QuantFin - Statistical Finance","volume":"189 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inferring financial stock returns correlation from complex network analysis","authors":"Ixandra Achitouv","doi":"arxiv-2407.20380","DOIUrl":"https://doi.org/arxiv-2407.20380","url":null,"abstract":"Financial stock returns correlations have been studied in the prism of random\u0000matrix theory, to distinguish the signal from the \"noise\". Eigenvalues of the\u0000matrix that are above the rescaled Marchenko Pastur distribution can be\u0000interpreted as collective modes behavior while the modes under are usually\u0000considered as noise. In this analysis we use complex network analysis to\u0000simulate the \"noise\" and the \"market\" component of the return correlations, by\u0000introducing some meaningful correlations in simulated geometric Brownian motion\u0000for the stocks. We find that the returns correlation matrix is dominated by\u0000stocks with high eigenvector centrality and clustering found in the network. We\u0000then use simulated \"market\" random walks to build an optimal portfolio and find\u0000that the overall return performs better than using the historical mean-variance\u0000data, up to 50% on short time scale.","PeriodicalId":501139,"journal":{"name":"arXiv - QuantFin - Statistical Finance","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contrastive Learning of Asset Embeddings from Financial Time Series","authors":"Rian Dolphin, Barry Smyth, Ruihai Dong","doi":"arxiv-2407.18645","DOIUrl":"https://doi.org/arxiv-2407.18645","url":null,"abstract":"Representation learning has emerged as a powerful paradigm for extracting\u0000valuable latent features from complex, high-dimensional data. In financial\u0000domains, learning informative representations for assets can be used for tasks\u0000like sector classification, and risk management. However, the complex and\u0000stochastic nature of financial markets poses unique challenges. We propose a\u0000novel contrastive learning framework to generate asset embeddings from\u0000financial time series data. Our approach leverages the similarity of asset\u0000returns over many subwindows to generate informative positive and negative\u0000samples, using a statistical sampling strategy based on hypothesis testing to\u0000address the noisy nature of financial data. We explore various contrastive loss\u0000functions that capture the relationships between assets in different ways to\u0000learn a discriminative representation space. Experiments on real-world datasets\u0000demonstrate the effectiveness of the learned asset embeddings on benchmark\u0000industry classification and portfolio optimization tasks. In each case our\u0000novel approaches significantly outperform existing baselines highlighting the\u0000potential for contrastive learning to capture meaningful and actionable\u0000relationships in financial data.","PeriodicalId":501139,"journal":{"name":"arXiv - QuantFin - Statistical Finance","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Financial Statement Analysis with Large Language Models","authors":"Alex Kim, Maximilian Muhn, Valeri Nikolaev","doi":"arxiv-2407.17866","DOIUrl":"https://doi.org/arxiv-2407.17866","url":null,"abstract":"We investigate whether an LLM can successfully perform financial statement\u0000analysis in a way similar to a professional human analyst. We provide\u0000standardized and anonymous financial statements to GPT4 and instruct the model\u0000to analyze them to determine the direction of future earnings. Even without any\u0000narrative or industry-specific information, the LLM outperforms financial\u0000analysts in its ability to predict earnings changes. The LLM exhibits a\u0000relative advantage over human analysts in situations when the analysts tend to\u0000struggle. Furthermore, we find that the prediction accuracy of the LLM is on\u0000par with the performance of a narrowly trained state-of-the-art ML model. LLM\u0000prediction does not stem from its training memory. Instead, we find that the\u0000LLM generates useful narrative insights about a company's future performance.\u0000Lastly, our trading strategies based on GPT's predictions yield a higher Sharpe\u0000ratio and alphas than strategies based on other models. Taken together, our\u0000results suggest that LLMs may take a central role in decision-making.","PeriodicalId":501139,"journal":{"name":"arXiv - QuantFin - Statistical Finance","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141774179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}