{"title":"Finding a needle in a haystack: A machine learning framework for anomaly detection in payment systems","authors":"Ajit Desai , Anneke Kosse , Jacob Sharples","doi":"10.1016/j.jfds.2025.100163","DOIUrl":"10.1016/j.jfds.2025.100163","url":null,"abstract":"<div><div>We propose a flexible machine learning (ML) framework for real-time transaction monitoring in high-value payment systems (HVPS), which are central to a country’s financial infrastructure and integral to financial stability. This framework can be used by system operators and overseers to detect anomalous transactions, which—if caused by a cyber attack or an operational outage and left undetected—could have serious implications for the HVPS, its participants and the financial system more broadly. Given the high volume of payments settled each day and the scarcity of actual anomalous transactions in HVPS, detecting anomalies resembles finding a needle in a haystack. Therefore, our framework employs a layered approach to manage the high volume of payments and isolate potential anomalies. In the first layer, a supervised ML algorithm is used to identify and separate ‘typical’ payments from ‘unusual’ payments. In the second layer, only the ‘unusual’ payments are run through an unsupervised ML algorithm for anomaly detection. We test this framework using artificially manipulated transactions and payments data from the Canadian HVPS. The ML algorithm employed in the first layer achieves a detection rate of 93 %, marking a significant improvement over commonly-used econometric models. The ML algorithm used in the second layer marks the artificially manipulated transactions as nearly twice as suspicious as the original transactions, proving its effectiveness.</div></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"11 ","pages":"Article 100163"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143891242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pairs trading with time-series deep learning models","authors":"Selin Yilmaz , Emre Sefer","doi":"10.1016/j.jfds.2026.100177","DOIUrl":"10.1016/j.jfds.2026.100177","url":null,"abstract":"<div><div>Pairs trading is a well-studied statistical arbitrage strategy including the identification of asset pairs exhibiting correlated changes in their historical prices. This statistical arbitrage strategy focuses on benefiting from non-permanent divergent behaviour of price, and it forecasts that the price relationship will revert to its usual and normal correlation. In this paper, we explore how more recent time-series-based deep learning techniques can be utilized in pairs trading, where cointegrated asset pairs are taken into account. We propose deep-learning and more traditional machine learning-based methods to predict the fluctuation of daily idiosyncratic residual terms between assets and their factor approximations. In our analysis, we focused on seven models: LSTM as a fundamental time-series method to capture interrelationships in dataset, Informer, Autoformer, iTransformer, Scaleformer, and Chronos as transformer-based deep time series methods, and AdaBoost, which is an ensemble learning-based machine learning method. We have assessed the performance of methods comprehensively over S&P 500 and cryptocurrency assets data starting from 2012 to 2020 respectively, and used a traditional statistical arbitrage-based relative value method as a baseline. All of our proposed learning-based methods turned out to be profitable strategies, obtaining higher Sharpe ratios and average returns by outperforming the baseline relative value method. Nevertheless, deep learning-based methods had a lower volume than the baseline, so when transaction costs are taken into account they showed better performance. Deep learning-based methods maximum drawdown was also lower than the traditional statistical arbitrage strategy. As a result, we show the benefits of time series-based deep learning methods in pairs trading across distinct asset classes.</div></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"11 ","pages":"Article 100177"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147537866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding money launderers using heterogeneous graph neural networks","authors":"Fredrik Johannessen , Martin Jullum","doi":"10.1016/j.jfds.2025.100175","DOIUrl":"10.1016/j.jfds.2025.100175","url":null,"abstract":"<div><div>The finance industry depends on effective anti-money laundering (AML) systems to ensure compliance and maintain operational efficiency. However, existing AML systems, which are predominantly rule-based, frequently struggle to detect money laundering accurately.</div><div>In particular, their inability to learn from historical data and properly account for diverse customer behavior is problematic. Also accounting for the vast amounts of transactional data generated daily, this challenge calls for big data analytics and advanced machine learning techniques.</div><div>In line with this, the present paper explores a graph neural network (GNN) approach, a state-of-the-art machine learning technique, to identify money laundering activities within a large heterogeneous network constructed from real-world bank transactions and business role data from DNB, Norway’s largest bank.</div><div>To this end, we extend the (homogeneous) Message Passing Neural Network (MPNN) architecture to operate on a heterogeneous graph, and demonstrate its strong performance in detecting money laundering activities.</div><div>We showcase the suitability of utilizing GNN methodology to improve electronic surveillance systems for detecting money laundering, thereby contributing a pioneering approach to AML through the application of advanced data science techniques.</div><div>To the best of our knowledge, this is the first publication applying heterogeneous GNNs for AML purposes with a large real-world heterogeneous network.</div></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"11 ","pages":"Article 100175"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145883669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Informed trading and expected returns","authors":"James J. Choi , Li Jin , Hongjun Yan","doi":"10.1016/j.jfds.2025.100174","DOIUrl":"10.1016/j.jfds.2025.100174","url":null,"abstract":"<div><div>Does information asymmetry affect the cross-section of expected stock returns? We explore this question using representative portfolio holdings data from the Shanghai Stock Exchange. We show that institutional investors have a strong information advantage, and that past aggressiveness of institutional trading in a stock positively predicts institutions’ future information advantage in this stock. Sorting stocks on this predictor and controlling for other correlates of expected returns, we find that the top quintile’s average annualized return in the next month is 10.8 % higher than the bottom quintile’s, indicating that information asymmetry increases expected returns.</div></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"11 ","pages":"Article 100174"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145883671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GARCH-PDE models for option pricing under stochastic volatility and their finite difference solvers","authors":"Qi Wang , Lu Zhang , Qian Zhang","doi":"10.1016/j.jfds.2026.100176","DOIUrl":"10.1016/j.jfds.2026.100176","url":null,"abstract":"<div><div>This paper presents numerical solvers for generative and hybrid option pricing models that unify econometric and diffusion-based approaches. These models are formulated as systems of continuous partial differential equations (PDEs), with stochastic volatility updated at discrete reset dates according to generalized autoregressive conditional heteroskedasticity (GARCH) dynamics. In contrast to approaches that estimate volatility from option prices using the Black–Scholes model or Monte Carlo simulations, this method simplifies option pricing under stochastic volatility by exogenously supplying and updating latent intraday volatility using discrete return data. We then develop and analyze numerical techniques for solving the resulting system of parabolic partial differential equations, which feature time-varying diffusion coefficients governed by the stochastic volatility paths inferred from the return data. Convergence and stability analyses of the numerical schemes attest to the option pricing accuracy of the proposed framework using available discrete implied volatility samples without compromising its computational accuracy. Several sets of numerical tests using SPX data are presented to illustrate our approach and demonstrate its superiority over other empirically well-tested pricing methods.</div></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"11 ","pages":"Article 100176"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147394554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Paper discussion at the third annual conference on capital market research in the era of AI","authors":"","doi":"10.1016/j.jfds.2025.100167","DOIUrl":"10.1016/j.jfds.2025.100167","url":null,"abstract":"","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"11 ","pages":"Article 100167"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145576048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dumb money? Social network attention herding, sentiment, and markets","authors":"Chengcheng Charlie Huang, Pauline Shum Nolan","doi":"10.1016/j.jfds.2025.100169","DOIUrl":"10.1016/j.jfds.2025.100169","url":null,"abstract":"<div><div>Wallstreetbets (WSB) is the perfect echo chamber to study retail investor behaviour and markets. We introduce a direct measure of individual stock attention and the concept of forum-wide attention herding. We fine-tune a large language model to classify investor sentiment. We find that WSB sentiment is inversely related to the VIX. In general, more individual stock attention leads to more stock purchases, and sentiment is a contrarian predictor of future returns. However, when attention herds on a stock with high user engagement, trades peak but there is no reversal in returns. Finally, our monthly attention herding portfolio generates sizable alphas.</div></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"11 ","pages":"Article 100169"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145525690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients☆","authors":"Parisa Davar , Frédéric Godin , Jose Garrido","doi":"10.1016/j.jfds.2025.100165","DOIUrl":"10.1016/j.jfds.2025.100165","url":null,"abstract":"<div><div>This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented.</div></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"11 ","pages":"Article 100165"},"PeriodicalIF":3.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144878183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised generation of tradable topic indices through textual analysis","authors":"Marcel Lee , Alan Spark","doi":"10.1016/j.jfds.2025.100149","DOIUrl":"10.1016/j.jfds.2025.100149","url":null,"abstract":"<div><div>Stock returns are moved by many risk factors. Thematic stock indices try to represent these factors, but are limited by the fact that risk factors are not directly observable. This paper introduces a method to uncover hidden risk factors through text analysis. It applies the dynamic variant of the <em>Latent Dirichlet Allocation</em> (LDA) model to annual and quarterly reports to find a topic distribution for each stock. This is then interpreted as the risk factor partition and transformed into a standard normal basis which corresponds to pure risk factors. The weights indicate the proportions necessary to combine the equities into tradable topic indices. The need for human intervention is minimized by determining the optimal parameters automatically.</div></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"11 ","pages":"Article 100149"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal rebalancing strategies reduce market variability","authors":"Helge Holden , Lars Holden","doi":"10.1016/j.jfds.2025.100151","DOIUrl":"10.1016/j.jfds.2025.100151","url":null,"abstract":"<div><div>The increasing fraction of passive funds influences stock market variability since passive investors behave differently than active investors. We demonstrate via simulations how portfolios that rebalance between different classes of assets influence the market variability. We prove that the optimal strategy for such portfolios when we include transaction costs, is only to rebalance when the portfolio leaves a no-trade region in the state space. This is the case also when the expectation and volatility of the prices are inhomogeneous. We show that portfolios that apply an optimal rebalance strategy reduce the variability in the stock market measured in the sum of the distances between local minimum and maximum of the prices in the stock market, also when these portfolios constitute only a small part of the market. However, the more usual rebalance strategies that only consider to rebalance at the end of a month or a quarter, have a much weaker influence on the market variability.</div></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"11 ","pages":"Article 100151"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143173405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}