Policy weighting via discounted Thomson sampling for non-stationary market-making

IF 13.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2025-07-19 DOI:10.1007/s10462-025-11312-9

Óscar Fernández Vicente, Javier García, Fernando Fernández

{"title":"Policy weighting via discounted Thomson sampling for non-stationary market-making","authors":"Óscar Fernández Vicente, Javier García, Fernando Fernández","doi":"10.1007/s10462-025-11312-9","DOIUrl":null,"url":null,"abstract":"<div><p>Market-making is an essential activity in every financial market. They provide liquidity to the system by placing buy and sell orders at multiple price levels. While performing this task, they aim to earn profit and manage inventory levels simultaneously. However, financial markets are not stationary environments; they constantly evolve, influenced by changes in participants, the occurrence of economic events, or the market trading hours, among others. This study introduces a novel approach to address the challenge of market-making in non-stationary financial markets with multi-objective Reinforcement Learning (RL). Traditional RL methods often struggle when applied to non-stationary environments, as the learned optimal policy may not be adapted to the new dynamics. We present Policy Weighting through Discounted Thompson Sampling (POW-dTS), a novel dynamic algorithm that adapts to changing market conditions by effectively weighting pre-trained policies across various contexts. Unlike some conventional methods, POW-dTS does not require additional artifacts such as change-point detection or models of transitions, making it robust against the unpredictability inherent in financial markets. Our approach focuses on optimizing trade profitability and managing inventory risk, the dual objectives of market makers. Through a detailed comparative analysis, we highlight the strengths and adaptability of POW-dTS against traditional techniques in non-stationary environments, demonstrating its potential to enhance market liquidity and efficiency.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 10","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11312-9.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11312-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Market-making is an essential activity in every financial market. They provide liquidity to the system by placing buy and sell orders at multiple price levels. While performing this task, they aim to earn profit and manage inventory levels simultaneously. However, financial markets are not stationary environments; they constantly evolve, influenced by changes in participants, the occurrence of economic events, or the market trading hours, among others. This study introduces a novel approach to address the challenge of market-making in non-stationary financial markets with multi-objective Reinforcement Learning (RL). Traditional RL methods often struggle when applied to non-stationary environments, as the learned optimal policy may not be adapted to the new dynamics. We present Policy Weighting through Discounted Thompson Sampling (POW-dTS), a novel dynamic algorithm that adapts to changing market conditions by effectively weighting pre-trained policies across various contexts. Unlike some conventional methods, POW-dTS does not require additional artifacts such as change-point detection or models of transitions, making it robust against the unpredictability inherent in financial markets. Our approach focuses on optimizing trade profitability and managing inventory risk, the dual objectives of market makers. Through a detailed comparative analysis, we highlight the strengths and adaptability of POW-dTS against traditional techniques in non-stationary environments, demonstrating its potential to enhance market liquidity and efficiency.

查看原文本刊更多论文

非平稳做市的贴现汤姆森抽样政策加权

做市是每一个金融市场的基本活动。他们通过在多个价格水平下买卖订单为系统提供流动性。在执行这项任务时，他们的目标是同时赚取利润和管理库存水平。然而，金融市场不是固定的环境；它们在参与者的变化、经济事件的发生或市场交易时间等因素的影响下不断演变。本研究引入了一种新的方法来解决非平稳金融市场中多目标强化学习（RL）的做市挑战。传统的强化学习方法在应用于非平稳环境时往往会遇到困难，因为学习到的最优策略可能无法适应新的动态。我们提出了通过折扣汤普森抽样（POW-dTS）的策略加权，这是一种新的动态算法，通过在各种环境中有效地加权预训练的策略来适应不断变化的市场条件。与一些传统方法不同，POW-dTS不需要额外的工件，如变更点检测或转换模型，使其对金融市场中固有的不可预测性具有鲁棒性。我们的方法侧重于优化交易盈利能力和管理库存风险，这是做市商的双重目标。通过详细的比较分析，我们强调了POW-dTS在非平稳环境中相对于传统技术的优势和适应性，展示了其提高市场流动性和效率的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.