Deep reinforcement learning for solving the stochastic e-waste collection problem

IF 6 2区管理学 Q1 OPERATIONS RESEARCH & MANAGEMENT SCIENCE

European Journal of Operational Research Pub Date : 2025-05-04 DOI:10.1016/j.ejor.2025.04.033

Dang Viet Anh Nguyen, Aldy Gunawan, Mustafa Misir, Lim Kwan Hui, Pieter Vansteenwegen

{"title":"Deep reinforcement learning for solving the stochastic e-waste collection problem","authors":"Dang Viet Anh Nguyen, Aldy Gunawan, Mustafa Misir, Lim Kwan Hui, Pieter Vansteenwegen","doi":"10.1016/j.ejor.2025.04.033","DOIUrl":null,"url":null,"abstract":"With the growing influence of the internet and information technology, Electrical and Electronic Equipment (EEE) has become a gateway to technological innovations. However, discarded devices, also called e-waste, pose a significant threat to the environment and human health if not properly treated, disposed of, or recycled. In this study, we extend a novel model for the e-waste collection in an urban context: the Heterogeneous VRP with Multiple Time Windows and Stochastic Travel Times (HVRP-MTWSTT). We propose a solution method that employs deep reinforcement learning to guide local search heuristics (DRL-LSH). The contributions of this paper are as follows: (1) HVRP-MTWSTT represents the first stochastic VRP in the context of the e-waste collection problem, incorporating complex constraints such as multiple time windows across a multi-period horizon with a heterogeneous vehicle fleet, (2) The DRL-LSH model uses deep reinforcement learning to provide an online adaptive operator selection layer, selecting the appropriate heuristic based on the search state. The computational experiments demonstrate that DRL-LSH outperforms the state-of-the-art hyperheuristic method by 24.26% on large-scale benchmark instances, with the performance gap increasing as the problem size grows. Additionally, to demonstrate the capability of DRL-LSH in addressing real-world problems, we tested and compared it with reference metaheuristic and hyperheuristic algorithms using a real-world e-waste collection case study in Singapore. The results showed that DRL-LSH significantly outperformed the reference algorithms on a real-world instance in terms of operating profit.","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"12 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2025-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1016/j.ejor.2025.04.033","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

With the growing influence of the internet and information technology, Electrical and Electronic Equipment (EEE) has become a gateway to technological innovations. However, discarded devices, also called e-waste, pose a significant threat to the environment and human health if not properly treated, disposed of, or recycled. In this study, we extend a novel model for the e-waste collection in an urban context: the Heterogeneous VRP with Multiple Time Windows and Stochastic Travel Times (HVRP-MTWSTT). We propose a solution method that employs deep reinforcement learning to guide local search heuristics (DRL-LSH). The contributions of this paper are as follows: (1) HVRP-MTWSTT represents the first stochastic VRP in the context of the e-waste collection problem, incorporating complex constraints such as multiple time windows across a multi-period horizon with a heterogeneous vehicle fleet, (2) The DRL-LSH model uses deep reinforcement learning to provide an online adaptive operator selection layer, selecting the appropriate heuristic based on the search state. The computational experiments demonstrate that DRL-LSH outperforms the state-of-the-art hyperheuristic method by 24.26% on large-scale benchmark instances, with the performance gap increasing as the problem size grows. Additionally, to demonstrate the capability of DRL-LSH in addressing real-world problems, we tested and compared it with reference metaheuristic and hyperheuristic algorithms using a real-world e-waste collection case study in Singapore. The results showed that DRL-LSH significantly outperformed the reference algorithms on a real-world instance in terms of operating profit.

查看原文本刊更多论文

求解随机电子垃圾收集问题的深度强化学习

随着互联网和信息技术的影响越来越大，电气电子设备（EEE）已成为技术创新的门户。然而，被丢弃的设备，也称为电子废物，如果不加以适当处理、处置或回收，将对环境和人类健康构成重大威胁。在这项研究中，我们扩展了一个新的城市背景下的电子垃圾收集模型：具有多时间窗和随机旅行时间的异构VRP （HVRP-MTWSTT）。我们提出了一种采用深度强化学习来指导局部搜索启发式（DRL-LSH）的解决方法。本文的贡献如下：(1)HVRP-MTWSTT代表了电子垃圾收集问题背景下的第一个随机VRP，包含了复杂的约束条件，如跨多周期水平的多时间窗口和异构车队；(2)DRL-LSH模型使用深度强化学习提供了一个在线自适应算子选择层，根据搜索状态选择合适的启发式。计算实验表明，在大规模基准实例上，DRL-LSH的性能优于最先进的超启发式方法24.26%，并且随着问题规模的增加，性能差距越来越大。此外，为了证明DRL-LSH在解决现实世界问题方面的能力，我们使用新加坡的现实世界电子垃圾收集案例研究，对其与参考元启发式和超启发式算法进行了测试和比较。结果表明，在实际实例中，DRL-LSH在运营利润方面明显优于参考算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Operational Research 管理科学-运筹学与管理科学

CiteScore

11.90

自引率

9.40%

发文量

786

审稿时长

8.2 months

期刊介绍： The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.