{"title":"针对选择性拆卸序列优化问题的基于价值的强化学习:演示和比较一个拟议模型","authors":"Shujin Qin, Zhiliang Bi, Jiacun Wang, Shixin Liu, Xiwang Guo, Ziyan Zhao, Liang Qi","doi":"10.1109/MSMC.2023.3303615","DOIUrl":null,"url":null,"abstract":"Selective optimal disassembly sequencing (SODS) is a methodology for the disassembly of waste products. Mathematically, it is an optimization problem. However, in the existing research, the connection between the optimization algorithms and the established model is limited to some specific processes, and their generality is poor. Due to the unique characteristics of each disassembly product, most disassembly sequences require modification and even reconstruction of the mathematical model. In this article, reinforcement learning (RL) is used to produce a single-item selective disassembly sequence based on the AND/OR graph. First, the AND/OR graph is mapped to a value matrix and represents the precedence relationship between the component and the values of the component itself. Second, on the basis of the established mathematical model and graph, value-based RL is used to solve the selective disassembly sequencing problem. Finally, the experimental results of the genetic algorithm (GA), Sarsa, Deep Q-learning (DQN), and CPLEX are compared to verify the correctness of the proposed model and the effectiveness of the RL algorithm.","PeriodicalId":516814,"journal":{"name":"IEEE Systems, Man, and Cybernetics Magazine","volume":"311 22","pages":"24-31"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Value-Based Reinforcement Learning for Selective Disassembly Sequence Optimization Problems: Demonstrating and Comparing a Proposed Model\",\"authors\":\"Shujin Qin, Zhiliang Bi, Jiacun Wang, Shixin Liu, Xiwang Guo, Ziyan Zhao, Liang Qi\",\"doi\":\"10.1109/MSMC.2023.3303615\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Selective optimal disassembly sequencing (SODS) is a methodology for the disassembly of waste products. Mathematically, it is an optimization problem. However, in the existing research, the connection between the optimization algorithms and the established model is limited to some specific processes, and their generality is poor. Due to the unique characteristics of each disassembly product, most disassembly sequences require modification and even reconstruction of the mathematical model. In this article, reinforcement learning (RL) is used to produce a single-item selective disassembly sequence based on the AND/OR graph. First, the AND/OR graph is mapped to a value matrix and represents the precedence relationship between the component and the values of the component itself. Second, on the basis of the established mathematical model and graph, value-based RL is used to solve the selective disassembly sequencing problem. Finally, the experimental results of the genetic algorithm (GA), Sarsa, Deep Q-learning (DQN), and CPLEX are compared to verify the correctness of the proposed model and the effectiveness of the RL algorithm.\",\"PeriodicalId\":516814,\"journal\":{\"name\":\"IEEE Systems, Man, and Cybernetics Magazine\",\"volume\":\"311 22\",\"pages\":\"24-31\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Systems, Man, and Cybernetics Magazine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MSMC.2023.3303615\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Systems, Man, and Cybernetics Magazine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSMC.2023.3303615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Value-Based Reinforcement Learning for Selective Disassembly Sequence Optimization Problems: Demonstrating and Comparing a Proposed Model
Selective optimal disassembly sequencing (SODS) is a methodology for the disassembly of waste products. Mathematically, it is an optimization problem. However, in the existing research, the connection between the optimization algorithms and the established model is limited to some specific processes, and their generality is poor. Due to the unique characteristics of each disassembly product, most disassembly sequences require modification and even reconstruction of the mathematical model. In this article, reinforcement learning (RL) is used to produce a single-item selective disassembly sequence based on the AND/OR graph. First, the AND/OR graph is mapped to a value matrix and represents the precedence relationship between the component and the values of the component itself. Second, on the basis of the established mathematical model and graph, value-based RL is used to solve the selective disassembly sequencing problem. Finally, the experimental results of the genetic algorithm (GA), Sarsa, Deep Q-learning (DQN), and CPLEX are compared to verify the correctness of the proposed model and the effectiveness of the RL algorithm.