{"title":"基于强化学习的随机作业车间系统动态变化排序规则","authors":"Jens Heger, T. Voss","doi":"10.1109/WSC48552.2020.9383903","DOIUrl":null,"url":null,"abstract":"Sequencing operations can be difficult, especially under uncertain conditions. Applying decentral sequencing rules has been a viable option; however, no rule exists that can outperform all other rules under varying system performance. For this reason, reinforcement learning (RL) is used as a hyper heuristic to select a sequencing rule based on the system status. Based on multiple training scenarios considering stochastic influences, such as varying inter arrival time or customers changing the product mix, the advantages of RL are presented. For evaluation, the trained agents are exploited in a generic manufacturing system. The best agent trained is able to dynamically adjust sequencing rules based on system performance, thereby matching and outperforming the presumed best static sequencing rules by ≈ 3%. Using the trained policy in an unknown scenario, the RL heuristic is still able to change the sequencing rule according to the system status, thereby providing robust performance.","PeriodicalId":6692,"journal":{"name":"2020 Winter Simulation Conference (WSC)","volume":"102 1","pages":"1608-1618"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dynamically Changing Sequencing Rules with Reinforcement Learning in a Job Shop System With Stochastic Influences\",\"authors\":\"Jens Heger, T. Voss\",\"doi\":\"10.1109/WSC48552.2020.9383903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequencing operations can be difficult, especially under uncertain conditions. Applying decentral sequencing rules has been a viable option; however, no rule exists that can outperform all other rules under varying system performance. For this reason, reinforcement learning (RL) is used as a hyper heuristic to select a sequencing rule based on the system status. Based on multiple training scenarios considering stochastic influences, such as varying inter arrival time or customers changing the product mix, the advantages of RL are presented. For evaluation, the trained agents are exploited in a generic manufacturing system. The best agent trained is able to dynamically adjust sequencing rules based on system performance, thereby matching and outperforming the presumed best static sequencing rules by ≈ 3%. Using the trained policy in an unknown scenario, the RL heuristic is still able to change the sequencing rule according to the system status, thereby providing robust performance.\",\"PeriodicalId\":6692,\"journal\":{\"name\":\"2020 Winter Simulation Conference (WSC)\",\"volume\":\"102 1\",\"pages\":\"1608-1618\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Winter Simulation Conference (WSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WSC48552.2020.9383903\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Winter Simulation Conference (WSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WSC48552.2020.9383903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dynamically Changing Sequencing Rules with Reinforcement Learning in a Job Shop System With Stochastic Influences
Sequencing operations can be difficult, especially under uncertain conditions. Applying decentral sequencing rules has been a viable option; however, no rule exists that can outperform all other rules under varying system performance. For this reason, reinforcement learning (RL) is used as a hyper heuristic to select a sequencing rule based on the system status. Based on multiple training scenarios considering stochastic influences, such as varying inter arrival time or customers changing the product mix, the advantages of RL are presented. For evaluation, the trained agents are exploited in a generic manufacturing system. The best agent trained is able to dynamically adjust sequencing rules based on system performance, thereby matching and outperforming the presumed best static sequencing rules by ≈ 3%. Using the trained policy in an unknown scenario, the RL heuristic is still able to change the sequencing rule according to the system status, thereby providing robust performance.