{"title":"A deep reinforcement learning trader without offline training","authors":"Boian Lazov","doi":"10.1016/j.asoc.2025.113881","DOIUrl":null,"url":null,"abstract":"<div><div>In this paper we pursue the question of a fully online trading algorithm (i.e. one that does not need offline training on previously gathered data). For this task we consider Double Deep <span><math><mi>Q</mi></math></span>-learning in the episodic setting with Fast Learning Networks approximating the expected reward <span><math><mi>Q</mi></math></span>. Additionally, we define the possible terminal states of an episode in such a way as to introduce a mechanism to conserve some of the money in the trading pool when market conditions are seen as unfavourable. Some of these money are taken as profit and some are reused at a later time according to certain criteria. After describing the algorithm, we test it using 1-minute-tick price data for 4 major cryptocurrencies from Binance. We see that the agent performs better than trading with randomly chosen actions on each timestep. And it does so when tested on the whole dataset for a given market as well as on different subsets, representing different market trends.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"185 ","pages":"Article 113881"},"PeriodicalIF":6.6000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625011949","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper we pursue the question of a fully online trading algorithm (i.e. one that does not need offline training on previously gathered data). For this task we consider Double Deep -learning in the episodic setting with Fast Learning Networks approximating the expected reward . Additionally, we define the possible terminal states of an episode in such a way as to introduce a mechanism to conserve some of the money in the trading pool when market conditions are seen as unfavourable. Some of these money are taken as profit and some are reused at a later time according to certain criteria. After describing the algorithm, we test it using 1-minute-tick price data for 4 major cryptocurrencies from Binance. We see that the agent performs better than trading with randomly chosen actions on each timestep. And it does so when tested on the whole dataset for a given market as well as on different subsets, representing different market trends.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.