一个没有离线训练的深度强化学习交易者

IF 6.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-09-22 DOI:10.1016/j.asoc.2025.113881

Boian Lazov

{"title":"一个没有离线训练的深度强化学习交易者","authors":"Boian Lazov","doi":"10.1016/j.asoc.2025.113881","DOIUrl":null,"url":null,"abstract":"<div><div>In this paper we pursue the question of a fully online trading algorithm (i.e. one that does not need offline training on previously gathered data). For this task we consider Double Deep <span><math><mi>Q</mi></math></span>-learning in the episodic setting with Fast Learning Networks approximating the expected reward <span><math><mi>Q</mi></math></span>. Additionally, we define the possible terminal states of an episode in such a way as to introduce a mechanism to conserve some of the money in the trading pool when market conditions are seen as unfavourable. Some of these money are taken as profit and some are reused at a later time according to certain criteria. After describing the algorithm, we test it using 1-minute-tick price data for 4 major cryptocurrencies from Binance. We see that the agent performs better than trading with randomly chosen actions on each timestep. And it does so when tested on the whole dataset for a given market as well as on different subsets, representing different market trends.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"185 ","pages":"Article 113881"},"PeriodicalIF":6.6000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A deep reinforcement learning trader without offline training\",\"authors\":\"Boian Lazov\",\"doi\":\"10.1016/j.asoc.2025.113881\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this paper we pursue the question of a fully online trading algorithm (i.e. one that does not need offline training on previously gathered data). For this task we consider Double Deep <span><math><mi>Q</mi></math></span>-learning in the episodic setting with Fast Learning Networks approximating the expected reward <span><math><mi>Q</mi></math></span>. Additionally, we define the possible terminal states of an episode in such a way as to introduce a mechanism to conserve some of the money in the trading pool when market conditions are seen as unfavourable. Some of these money are taken as profit and some are reused at a later time according to certain criteria. After describing the algorithm, we test it using 1-minute-tick price data for 4 major cryptocurrencies from Binance. We see that the agent performs better than trading with randomly chosen actions on each timestep. And it does so when tested on the whole dataset for a given market as well as on different subsets, representing different market trends.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"185 \",\"pages\":\"Article 113881\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625011949\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625011949","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们探讨了一个完全在线交易算法的问题（即不需要对先前收集的数据进行离线训练）。对于这项任务，我们考虑在情景设置中使用快速学习网络进行双深度q学习，以接近预期奖励q。此外，我们以这样一种方式定义了一个情节可能的终端状态，以便在市场条件被视为不利时引入一种机制来保存交易池中的一些资金。其中一些钱被作为利润，一些在以后的时间根据一定的标准被重新使用。在描述了算法之后，我们使用币安4种主要加密货币的1分钟价格数据对其进行了测试。我们看到代理比在每个时间步上随机选择行动的交易表现得更好。当对给定市场的整个数据集以及代表不同市场趋势的不同子集进行测试时，它会这样做。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A deep reinforcement learning trader without offline training

In this paper we pursue the question of a fully online trading algorithm (i.e. one that does not need offline training on previously gathered data). For this task we consider Double Deep

Q

-learning in the episodic setting with Fast Learning Networks approximating the expected reward

Q

. Additionally, we define the possible terminal states of an episode in such a way as to introduce a mechanism to conserve some of the money in the trading pool when market conditions are seen as unfavourable. Some of these money are taken as profit and some are reused at a later time according to certain criteria. After describing the algorithm, we test it using 1-minute-tick price data for 4 major cryptocurrencies from Binance. We see that the agent performs better than trading with randomly chosen actions on each timestep. And it does so when tested on the whole dataset for a given market as well as on different subsets, representing different market trends.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.