环境、社会和治理（ESG）金融投资组合管理中深度强化学习的多目标贝叶斯优化

Q1 Economics, Econometrics and Finance

Intelligent Systems in Accounting, Finance and Management Pub Date : 2025-06-19 DOI:10.1002/isaf.70008

Eduardo C. Garrido-Merchán, Sol Mora-Figueroa, María Coronado-Vaca

{"title":"环境、社会和治理（ESG）金融投资组合管理中深度强化学习的多目标贝叶斯优化","authors":"Eduardo C. Garrido-Merchán, Sol Mora-Figueroa, María Coronado-Vaca","doi":"10.1002/isaf.70008","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Financial portfolio management focuses on the maximization of several objectives in a trading period related not only to the risk and performance of the portfolio but also to other objectives such as the environment, social, and governance (ESG) score of the portfolio. Regrettably, classic methods such as the Markowitz model do not take into account ESG scores but only the risk and performance of the portfolio. Moreover, the assumptions made by this model about the financial returns make it unfeasible to be applicable to markets with high volatility such as the technological sector. This paper investigates the application of deep reinforcement learning (DRL) for ESG financial portfolio management. DRL agents circumvent the issue of classic models in the sense that they do not make assumptions like the financial returns being normally distributed and are able to deal with any information like the ESG score if they are configured to gain a reward that makes an objective better. However, the performance of DRL agents has high variability, and it is very sensible to the value of their hyperparameters. Bayesian optimization is a class of methods that are suited to the optimization of black-box functions, that is, functions whose analytical expression is unknown and are noisy and expensive to evaluate. The hyperparameter tuning problem of DRL algorithms perfectly suits this scenario. As training an agent just for one objective is a very expensive period, requiring millions of timesteps, instead of optimizing an objective being a mixture of a risk-performance metric and an ESG metric, we choose to separate the objective and solve the multi-objective scenario to obtain an optimal Pareto set of portfolios representing the best trade-off between the Sharpe ratio and the ESG mean score of the portfolio and leaving to the investor the choice of the final portfolio. We conducted our experiments using environments encoded within the OpenAI Gym, adapted from the FinRL platform. The experiments are carried out in the Dow Jones Industrial Average (DJIA) and the NASDAQ markets in terms of the Sharpe ratio achieved by the agent and the mean ESG score of the portfolio. We compare the performance of the obtained Pareto sets in hypervolume terms illustrating how portfolios are the best trade-off between the Sharpe ratio and mean ESG score. Also, we show the usefulness of our proposed methodology by comparing the obtained hypervolume with one achieved by a random search methodology on the DRL hyperparameter space.</p>\n </div>","PeriodicalId":53473,"journal":{"name":"Intelligent Systems in Accounting, Finance and Management","volume":"32 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management\",\"authors\":\"Eduardo C. Garrido-Merchán, Sol Mora-Figueroa, María Coronado-Vaca\",\"doi\":\"10.1002/isaf.70008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Financial portfolio management focuses on the maximization of several objectives in a trading period related not only to the risk and performance of the portfolio but also to other objectives such as the environment, social, and governance (ESG) score of the portfolio. Regrettably, classic methods such as the Markowitz model do not take into account ESG scores but only the risk and performance of the portfolio. Moreover, the assumptions made by this model about the financial returns make it unfeasible to be applicable to markets with high volatility such as the technological sector. This paper investigates the application of deep reinforcement learning (DRL) for ESG financial portfolio management. DRL agents circumvent the issue of classic models in the sense that they do not make assumptions like the financial returns being normally distributed and are able to deal with any information like the ESG score if they are configured to gain a reward that makes an objective better. However, the performance of DRL agents has high variability, and it is very sensible to the value of their hyperparameters. Bayesian optimization is a class of methods that are suited to the optimization of black-box functions, that is, functions whose analytical expression is unknown and are noisy and expensive to evaluate. The hyperparameter tuning problem of DRL algorithms perfectly suits this scenario. As training an agent just for one objective is a very expensive period, requiring millions of timesteps, instead of optimizing an objective being a mixture of a risk-performance metric and an ESG metric, we choose to separate the objective and solve the multi-objective scenario to obtain an optimal Pareto set of portfolios representing the best trade-off between the Sharpe ratio and the ESG mean score of the portfolio and leaving to the investor the choice of the final portfolio. We conducted our experiments using environments encoded within the OpenAI Gym, adapted from the FinRL platform. The experiments are carried out in the Dow Jones Industrial Average (DJIA) and the NASDAQ markets in terms of the Sharpe ratio achieved by the agent and the mean ESG score of the portfolio. We compare the performance of the obtained Pareto sets in hypervolume terms illustrating how portfolios are the best trade-off between the Sharpe ratio and mean ESG score. Also, we show the usefulness of our proposed methodology by comparing the obtained hypervolume with one achieved by a random search methodology on the DRL hyperparameter space.</p>\\n </div>\",\"PeriodicalId\":53473,\"journal\":{\"name\":\"Intelligent Systems in Accounting, Finance and Management\",\"volume\":\"32 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Systems in Accounting, Finance and Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/isaf.70008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Economics, Econometrics and Finance\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems in Accounting, Finance and Management","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/isaf.70008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Economics, Econometrics and Finance","Score":null,"Total":0}

引用次数: 0

摘要

金融投资组合管理的重点是在交易期间实现几个目标的最大化，这些目标不仅与投资组合的风险和绩效有关，还与投资组合的环境、社会和治理（ESG）分数等其他目标有关。遗憾的是，像马科维茨模型这样的经典方法并没有考虑ESG分数，而只是考虑投资组合的风险和表现。此外，该模型对财务回报的假设使得它不适用于技术部门等高波动性市场。本文研究了深度强化学习（DRL）在ESG金融投资组合管理中的应用。DRL代理规避了经典模型的问题，因为它们不假设财务回报是正态分布的，并且能够处理任何信息，如ESG分数，如果它们被配置为获得使目标更好的奖励。然而，DRL代理的性能具有很高的可变性，并且对其超参数的值非常敏感。贝叶斯优化是一类适用于黑盒函数优化的方法，黑盒函数是指解析表达式未知、有噪声且计算成本高的函数。DRL算法的超参数调优问题非常适合这种情况。培训代理只是为了一个目标是一个非常昂贵的时期,需要数以百万计的步伐,而不是优化客观的混合物risk-performance指标和环境、社会和治理度规,我们选择独立的目标,解决多目标场景获得一组最优帕累托的组合代表最好的夏普比率和环境、社会和治理之间的平衡投资组合的平均评分,让投资者的选择最终的投资组合。我们使用OpenAI Gym中编码的环境进行实验，该环境改编自FinRL平台。在道琼斯工业平均指数（DJIA）和纳斯达克市场上进行了实验，研究了代理实现的夏普比率和投资组合的平均ESG得分。我们比较了在超大容量条件下获得的帕累托集的性能，说明了投资组合如何在夏普比率和平均ESG分数之间实现最佳权衡。此外，我们通过比较获得的超卷与在DRL超参数空间上随机搜索方法获得的超卷来证明我们提出的方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management

Financial portfolio management focuses on the maximization of several objectives in a trading period related not only to the risk and performance of the portfolio but also to other objectives such as the environment, social, and governance (ESG) score of the portfolio. Regrettably, classic methods such as the Markowitz model do not take into account ESG scores but only the risk and performance of the portfolio. Moreover, the assumptions made by this model about the financial returns make it unfeasible to be applicable to markets with high volatility such as the technological sector. This paper investigates the application of deep reinforcement learning (DRL) for ESG financial portfolio management. DRL agents circumvent the issue of classic models in the sense that they do not make assumptions like the financial returns being normally distributed and are able to deal with any information like the ESG score if they are configured to gain a reward that makes an objective better. However, the performance of DRL agents has high variability, and it is very sensible to the value of their hyperparameters. Bayesian optimization is a class of methods that are suited to the optimization of black-box functions, that is, functions whose analytical expression is unknown and are noisy and expensive to evaluate. The hyperparameter tuning problem of DRL algorithms perfectly suits this scenario. As training an agent just for one objective is a very expensive period, requiring millions of timesteps, instead of optimizing an objective being a mixture of a risk-performance metric and an ESG metric, we choose to separate the objective and solve the multi-objective scenario to obtain an optimal Pareto set of portfolios representing the best trade-off between the Sharpe ratio and the ESG mean score of the portfolio and leaving to the investor the choice of the final portfolio. We conducted our experiments using environments encoded within the OpenAI Gym, adapted from the FinRL platform. The experiments are carried out in the Dow Jones Industrial Average (DJIA) and the NASDAQ markets in terms of the Sharpe ratio achieved by the agent and the mean ESG score of the portfolio. We compare the performance of the obtained Pareto sets in hypervolume terms illustrating how portfolios are the best trade-off between the Sharpe ratio and mean ESG score. Also, we show the usefulness of our proposed methodology by comparing the obtained hypervolume with one achieved by a random search methodology on the DRL hyperparameter space.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligent Systems in Accounting, Finance and Management Economics, Econometrics and Finance-Finance

CiteScore

6.00

自引率

0.00%

发文量

期刊介绍： Intelligent Systems in Accounting, Finance and Management is a quarterly international journal which publishes original, high quality material dealing with all aspects of intelligent systems as they relate to the fields of accounting, economics, finance, marketing and management. In addition, the journal also is concerned with related emerging technologies, including big data, business intelligence, social media and other technologies. It encourages the development of novel technologies, and the embedding of new and existing technologies into applications of real, practical value. Therefore, implementation issues are of as much concern as development issues. The journal is designed to appeal to academics in the intelligent systems, emerging technologies and business fields, as well as to advanced practitioners who wish to improve the effectiveness, efficiency, or economy of their working practices. A special feature of the journal is the use of two groups of reviewers, those who specialize in intelligent systems work, and also those who specialize in applications areas. Reviewers are asked to address issues of originality and actual or potential impact on research, teaching, or practice in the accounting, finance, or management fields. Authors working on conceptual developments or on laboratory-based explorations of data sets therefore need to address the issue of potential impact at some level in submissions to the journal.