A K-means Supported Reinforcement Learning Framework to Multi-dimensional Knapsack

IF 1.8 3区数学 Q1 Mathematics

Journal of Global Optimization Pub Date : 2024-02-15 DOI:10.1007/s10898-024-01364-6

Sabah Bushaj, İ. Esra Büyüktahtakın

{"title":"A K-means Supported Reinforcement Learning Framework to Multi-dimensional Knapsack","authors":"Sabah Bushaj, İ. Esra Büyüktahtakın","doi":"10.1007/s10898-024-01364-6","DOIUrl":null,"url":null,"abstract":"<p>In this paper, we address the difficulty of solving large-scale multi-dimensional knapsack instances (MKP), presenting a novel deep reinforcement learning (DRL) framework. In this DRL framework, we train different agents compatible with a discrete action space for sequential decision-making while still satisfying any resource constraint of the MKP. This novel framework incorporates the decision variable values in the 2D DRL where the agent is responsible for assigning a value of 1 or 0 to each of the variables. To the best of our knowledge, this is the first DRL model of its kind in which a 2D environment is formulated, and an element of the DRL solution matrix represents an item of the MKP. Our framework is configured to solve MKP instances of different dimensions and distributions. We propose a K-means approach to obtain an initial feasible solution that is used to train the DRL agent. We train four different agents in our framework and present the results comparing each of them with the CPLEX commercial solver. The results show that our agents can learn and generalize over instances with different sizes and distributions. Our DRL framework shows that it can solve medium-sized instances at least 45 times faster in CPU solution time and at least 10 times faster for large instances, with a maximum solution gap of 0.28% compared to the performance of CPLEX. Furthermore, at least 95% of the items are predicted in line with the CPLEX solution. Computations with DRL also provide a better optimality gap with respect to state-of-the-art approaches.</p>","PeriodicalId":15961,"journal":{"name":"Journal of Global Optimization","volume":"121 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Global Optimization","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10898-024-01364-6","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we address the difficulty of solving large-scale multi-dimensional knapsack instances (MKP), presenting a novel deep reinforcement learning (DRL) framework. In this DRL framework, we train different agents compatible with a discrete action space for sequential decision-making while still satisfying any resource constraint of the MKP. This novel framework incorporates the decision variable values in the 2D DRL where the agent is responsible for assigning a value of 1 or 0 to each of the variables. To the best of our knowledge, this is the first DRL model of its kind in which a 2D environment is formulated, and an element of the DRL solution matrix represents an item of the MKP. Our framework is configured to solve MKP instances of different dimensions and distributions. We propose a K-means approach to obtain an initial feasible solution that is used to train the DRL agent. We train four different agents in our framework and present the results comparing each of them with the CPLEX commercial solver. The results show that our agents can learn and generalize over instances with different sizes and distributions. Our DRL framework shows that it can solve medium-sized instances at least 45 times faster in CPU solution time and at least 10 times faster for large instances, with a maximum solution gap of 0.28% compared to the performance of CPLEX. Furthermore, at least 95% of the items are predicted in line with the CPLEX solution. Computations with DRL also provide a better optimality gap with respect to state-of-the-art approaches.

Abstract Image

查看原文本刊更多论文

针对多维包的 K-means 支持强化学习框架

在本文中，我们提出了一种新颖的深度强化学习（DRL）框架，以解决大规模多维knapsack实例（MKP）的求解难题。在这个 DRL 框架中，我们训练与离散行动空间兼容的不同代理，以便在满足 MKP 的任何资源限制的同时进行顺序决策。这种新颖的框架将决策变量值纳入了二维 DRL，由代理负责为每个变量赋值 1 或 0。据我们所知，这是首个二维环境下的 DRL 模型，DRL 解矩阵的一个元素代表 MKP 的一个项目。我们的框架可用于解决不同维度和分布的 MKP 实例。我们提出了一种 K-means 方法来获取初始可行解，并将其用于训练 DRL 代理。我们在框架中训练了四个不同的代理，并将每个代理的结果与 CPLEX 商业求解器进行了比较。结果表明，我们的代理可以对不同规模和分布的实例进行学习和泛化。我们的 DRL 框架显示，与 CPLEX 的性能相比，它解决中等规模实例的 CPU 解算时间至少快 45 倍，解决大型实例的 CPU 解算时间至少快 10 倍，最大解算差距为 0.28%。此外，至少 95% 的项目预测结果与 CPLEX 解决方案一致。与最先进的方法相比，使用 DRL 计算还能提供更好的优化差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Global Optimization 数学-应用数学

CiteScore

0.10

自引率

5.60%

发文量

137

审稿时长

6 months

期刊介绍： The Journal of Global Optimization publishes carefully refereed papers that encompass theoretical, computational, and applied aspects of global optimization. While the focus is on original research contributions dealing with the search for global optima of non-convex, multi-extremal problems, the journal’s scope covers optimization in the widest sense, including nonlinear, mixed integer, combinatorial, stochastic, robust, multi-objective optimization, computational geometry, and equilibrium problems. Relevant works on data-driven methods and optimization-based data mining are of special interest. In addition to papers covering theory and algorithms of global optimization, the journal publishes significant papers on numerical experiments, new testbeds, and applications in engineering, management, and the sciences. Applications of particular interest include healthcare, computational biochemistry, energy systems, telecommunications, and finance. Apart from full-length articles, the journal features short communications on both open and solved global optimization problems. It also offers reviews of relevant books and publishes special issues.