机器学习树修剪更快的马尔可夫奖励游戏解决方案

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Computational Science Pub Date : 2025-10-08 DOI:10.1016/j.jocs.2025.102726

Burhaneddin İzgi , Murat Özkaya , Nazım Kemal Üre , Matjaž Perc

{"title":"机器学习树修剪更快的马尔可夫奖励游戏解决方案","authors":"Burhaneddin İzgi , Murat Özkaya , Nazım Kemal Üre , Matjaž Perc","doi":"10.1016/j.jocs.2025.102726","DOIUrl":null,"url":null,"abstract":"<div><div>Existing methodologies for solving Markov reward games mostly rely on state–action frameworks and iterative algorithms to address these challenges. However, these approaches often impose significant computational burdens, particularly when applied to large-scale games, due to their inherent complexity and the need for extensive iterative calculations. In this paper, we propose a new neural network architecture for solving Markov reward games in the form of a decision tree with relatively large state and action sets, such as 2-actions-3-stages, 3-actions-3-stages, and 4-actions-3-stages, by trimming the decision tree. In this context, we generate datasets of Markov reward games with sizes ranging from <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>3</mn></mrow></msup></mrow></math></span> to <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>5</mn></mrow></msup></mrow></math></span> using the holistic matrix norm-based solution method and obtain the necessary components, such as the payoff matrices and the corresponding solutions of the games, for training the neural network. We then propose a vectorization process to prepare the outcomes of the matrix norm-based solution method and adapt them for training the proposed neural network. The neural network is trained using both the vectorized payoff and transition matrices as input, and the prediction system generates the optimal strategy set as output. In the model, we approach the problem as a classification task by labeling the optimal and non-optimal branches of the decision tree with ones and zeros, respectively, to identify the most rewarding paths of each game. As a result, we propose a novel neural network architecture for solving Markov reward games in real time, enhancing its practicality for real-world applications. The results reveal that the system efficiently predicts the optimal paths for each decision tree, with f1-scores slightly greater than 0.99, 0.99, and 0.97 for Markov reward games with 2-actions-3-stages, 3-actions-3-stages, and 4-actions-3-stages, respectively.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"92 ","pages":"Article 102726"},"PeriodicalIF":3.7000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning tree trimming for faster Markov reward game solutions\",\"authors\":\"Burhaneddin İzgi , Murat Özkaya , Nazım Kemal Üre , Matjaž Perc\",\"doi\":\"10.1016/j.jocs.2025.102726\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Existing methodologies for solving Markov reward games mostly rely on state–action frameworks and iterative algorithms to address these challenges. However, these approaches often impose significant computational burdens, particularly when applied to large-scale games, due to their inherent complexity and the need for extensive iterative calculations. In this paper, we propose a new neural network architecture for solving Markov reward games in the form of a decision tree with relatively large state and action sets, such as 2-actions-3-stages, 3-actions-3-stages, and 4-actions-3-stages, by trimming the decision tree. In this context, we generate datasets of Markov reward games with sizes ranging from <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>3</mn></mrow></msup></mrow></math></span> to <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>5</mn></mrow></msup></mrow></math></span> using the holistic matrix norm-based solution method and obtain the necessary components, such as the payoff matrices and the corresponding solutions of the games, for training the neural network. We then propose a vectorization process to prepare the outcomes of the matrix norm-based solution method and adapt them for training the proposed neural network. The neural network is trained using both the vectorized payoff and transition matrices as input, and the prediction system generates the optimal strategy set as output. In the model, we approach the problem as a classification task by labeling the optimal and non-optimal branches of the decision tree with ones and zeros, respectively, to identify the most rewarding paths of each game. As a result, we propose a novel neural network architecture for solving Markov reward games in real time, enhancing its practicality for real-world applications. The results reveal that the system efficiently predicts the optimal paths for each decision tree, with f1-scores slightly greater than 0.99, 0.99, and 0.97 for Markov reward games with 2-actions-3-stages, 3-actions-3-stages, and 4-actions-3-stages, respectively.</div></div>\",\"PeriodicalId\":48907,\"journal\":{\"name\":\"Journal of Computational Science\",\"volume\":\"92 \",\"pages\":\"Article 102726\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1877750325002030\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Science","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877750325002030","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

解决马尔可夫奖励博弈的现有方法主要依赖于状态-行动框架和迭代算法来解决这些挑战。然而，由于其固有的复杂性和对大量迭代计算的需求，这些方法通常会带来巨大的计算负担，特别是在应用于大型游戏时。在本文中，我们提出了一种新的神经网络架构，通过修剪决策树，以决策树的形式求解具有相对较大的状态和动作集（如2-动作-3阶段，3-动作-3阶段和4-动作-3阶段）的马尔可夫奖励博弈。在这种情况下，我们使用基于整体矩阵范数的解方法生成了规模在103到105之间的马尔可夫奖励博弈数据集，并获得了训练神经网络所需的组件，如收益矩阵和博弈的相应解。然后，我们提出了一个矢量化过程来准备基于矩阵范数的解决方法的结果，并将它们用于训练所提出的神经网络。神经网络以矢量化的收益矩阵和转移矩阵作为输入进行训练，预测系统生成最优策略集作为输出。在该模型中，我们通过将决策树的最优和非最优分支分别标记为1和0来将问题作为分类任务来处理，以确定每个博弈的最优路径。因此，我们提出了一种新的神经网络架构，用于实时求解马尔可夫奖励游戏，增强了其在现实世界应用中的实用性。结果表明，该系统有效地预测了每个决策树的最优路径，对于2-行动-3阶段、3-行动-3阶段和4-行动-3阶段的马尔可夫奖励博弈，f1得分分别略大于0.99、0.99和0.97。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine learning tree trimming for faster Markov reward game solutions

Existing methodologies for solving Markov reward games mostly rely on state–action frameworks and iterative algorithms to address these challenges. However, these approaches often impose significant computational burdens, particularly when applied to large-scale games, due to their inherent complexity and the need for extensive iterative calculations. In this paper, we propose a new neural network architecture for solving Markov reward games in the form of a decision tree with relatively large state and action sets, such as 2-actions-3-stages, 3-actions-3-stages, and 4-actions-3-stages, by trimming the decision tree. In this context, we generate datasets of Markov reward games with sizes ranging from

1 0^{3}

1 0^{5}

using the holistic matrix norm-based solution method and obtain the necessary components, such as the payoff matrices and the corresponding solutions of the games, for training the neural network. We then propose a vectorization process to prepare the outcomes of the matrix norm-based solution method and adapt them for training the proposed neural network. The neural network is trained using both the vectorized payoff and transition matrices as input, and the prediction system generates the optimal strategy set as output. In the model, we approach the problem as a classification task by labeling the optimal and non-optimal branches of the decision tree with ones and zeros, respectively, to identify the most rewarding paths of each game. As a result, we propose a novel neural network architecture for solving Markov reward games in real time, enhancing its practicality for real-world applications. The results reveal that the system efficiently predicts the optimal paths for each decision tree, with f1-scores slightly greater than 0.99, 0.99, and 0.97 for Markov reward games with 2-actions-3-stages, 3-actions-3-stages, and 4-actions-3-stages, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Computational Science COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

5.50

自引率

3.00%

发文量

227

审稿时长

41 days

期刊介绍： Computational Science is a rapidly growing multi- and interdisciplinary field that uses advanced computing and data analysis to understand and solve complex problems. It has reached a level of predictive capability that now firmly complements the traditional pillars of experimentation and theory. The recent advances in experimental techniques such as detectors, on-line sensor networks and high-resolution imaging techniques, have opened up new windows into physical and biological processes at many levels of detail. The resulting data explosion allows for detailed data driven modeling and simulation. This new discipline in science combines computational thinking, modern computational methods, devices and collateral technologies to address problems far beyond the scope of traditional numerical methods. Computational science typically unifies three distinct elements: • Modeling, Algorithms and Simulations (e.g. numerical and non-numerical, discrete and continuous); • Software developed to solve science (e.g., biological, physical, and social), engineering, medicine, and humanities problems; • Computer and information science that develops and optimizes the advanced system hardware, software, networking, and data management components (e.g. problem solving environments).