{"title":"基于强化学习的随机并行机器调度","authors":"Juxihong Julaiti, Seog-Chan Oh, Dyutimoy Das, Soundar Kumara","doi":"10.1002/amp2.10119","DOIUrl":null,"url":null,"abstract":"<p>In a high-mix and low-volume manufacturing facility, heterogeneous jobs introduce frequent reconfiguration of machines which increases the chance of unplanned machine breakdowns. As machines are often nonidentical and their performance degrades over time, it is critical to consider the heterogeneity and non-stationarity of the machines during scheduling. We propose a reinforcement learning-based framework with a novel sampling method to train the agent to schedule heterogeneous jobs on non-stationary unreliable parallel machines to minimize weighted tardiness. The results indicate that the new sampling approach expedites the learning process and the resulting policy significantly outperforms static dispatching rules.</p>","PeriodicalId":87290,"journal":{"name":"Journal of advanced manufacturing and processing","volume":"4 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://aiche.onlinelibrary.wiley.com/doi/epdf/10.1002/amp2.10119","citationCount":"4","resultStr":"{\"title\":\"Stochastic parallel machine scheduling using reinforcement learning\",\"authors\":\"Juxihong Julaiti, Seog-Chan Oh, Dyutimoy Das, Soundar Kumara\",\"doi\":\"10.1002/amp2.10119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In a high-mix and low-volume manufacturing facility, heterogeneous jobs introduce frequent reconfiguration of machines which increases the chance of unplanned machine breakdowns. As machines are often nonidentical and their performance degrades over time, it is critical to consider the heterogeneity and non-stationarity of the machines during scheduling. We propose a reinforcement learning-based framework with a novel sampling method to train the agent to schedule heterogeneous jobs on non-stationary unreliable parallel machines to minimize weighted tardiness. The results indicate that the new sampling approach expedites the learning process and the resulting policy significantly outperforms static dispatching rules.</p>\",\"PeriodicalId\":87290,\"journal\":{\"name\":\"Journal of advanced manufacturing and processing\",\"volume\":\"4 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://aiche.onlinelibrary.wiley.com/doi/epdf/10.1002/amp2.10119\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of advanced manufacturing and processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/amp2.10119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of advanced manufacturing and processing","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/amp2.10119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Stochastic parallel machine scheduling using reinforcement learning
In a high-mix and low-volume manufacturing facility, heterogeneous jobs introduce frequent reconfiguration of machines which increases the chance of unplanned machine breakdowns. As machines are often nonidentical and their performance degrades over time, it is critical to consider the heterogeneity and non-stationarity of the machines during scheduling. We propose a reinforcement learning-based framework with a novel sampling method to train the agent to schedule heterogeneous jobs on non-stationary unreliable parallel machines to minimize weighted tardiness. The results indicate that the new sampling approach expedites the learning process and the resulting policy significantly outperforms static dispatching rules.