{"title":"Value-Based Reinforcement Learning for Selective Disassembly Sequence Optimization Problems: Demonstrating and Comparing a Proposed Model","authors":"Shujin Qin, Zhiliang Bi, Jiacun Wang, Shixin Liu, Xiwang Guo, Ziyan Zhao, Liang Qi","doi":"10.1109/MSMC.2023.3303615","DOIUrl":null,"url":null,"abstract":"Selective optimal disassembly sequencing (SODS) is a methodology for the disassembly of waste products. Mathematically, it is an optimization problem. However, in the existing research, the connection between the optimization algorithms and the established model is limited to some specific processes, and their generality is poor. Due to the unique characteristics of each disassembly product, most disassembly sequences require modification and even reconstruction of the mathematical model. In this article, reinforcement learning (RL) is used to produce a single-item selective disassembly sequence based on the AND/OR graph. First, the AND/OR graph is mapped to a value matrix and represents the precedence relationship between the component and the values of the component itself. Second, on the basis of the established mathematical model and graph, value-based RL is used to solve the selective disassembly sequencing problem. Finally, the experimental results of the genetic algorithm (GA), Sarsa, Deep Q-learning (DQN), and CPLEX are compared to verify the correctness of the proposed model and the effectiveness of the RL algorithm.","PeriodicalId":516814,"journal":{"name":"IEEE Systems, Man, and Cybernetics Magazine","volume":"311 22","pages":"24-31"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Systems, Man, and Cybernetics Magazine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSMC.2023.3303615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Selective optimal disassembly sequencing (SODS) is a methodology for the disassembly of waste products. Mathematically, it is an optimization problem. However, in the existing research, the connection between the optimization algorithms and the established model is limited to some specific processes, and their generality is poor. Due to the unique characteristics of each disassembly product, most disassembly sequences require modification and even reconstruction of the mathematical model. In this article, reinforcement learning (RL) is used to produce a single-item selective disassembly sequence based on the AND/OR graph. First, the AND/OR graph is mapped to a value matrix and represents the precedence relationship between the component and the values of the component itself. Second, on the basis of the established mathematical model and graph, value-based RL is used to solve the selective disassembly sequencing problem. Finally, the experimental results of the genetic algorithm (GA), Sarsa, Deep Q-learning (DQN), and CPLEX are compared to verify the correctness of the proposed model and the effectiveness of the RL algorithm.