{"title":"Policy Transition of Reinforcement Learning for an Agent Based SCM System","authors":"Gang Zhao, R. Sun","doi":"10.1109/INDIN.2006.275663","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) is successfully applied to some dynamical and unpredictable domains. The Supply Chain Management (SCM) is NP-hard problem. Some proposed RL methods perform better than traditional tools for dynamic problem solving in SCM. It realizes on-line learning and performs efficiently in some applications, but RL agent reacts worse than some heuristic methods to sudden changes in SCM demand since the trial-and-error characteristic of RL is time-consuming in practice. By surveying an efficient policy transition mechanism in RL about how to mapping existing policies in the previous task to a new policies in a changed task, this paper proposes a novel RL agent based SCM system that decreases learning time of the RL agent to a dynamic environment. As the result, the RL agent derives the maximal profit using RL technique as jobs coming with a stable distribution. Further, the RL agent makes the optimal procurement satisfying the requirement of sudden changes in the supply chain network by the policy transition mechanism.","PeriodicalId":120426,"journal":{"name":"2006 4th IEEE International Conference on Industrial Informatics","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 4th IEEE International Conference on Industrial Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN.2006.275663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Reinforcement learning (RL) is successfully applied to some dynamical and unpredictable domains. The Supply Chain Management (SCM) is NP-hard problem. Some proposed RL methods perform better than traditional tools for dynamic problem solving in SCM. It realizes on-line learning and performs efficiently in some applications, but RL agent reacts worse than some heuristic methods to sudden changes in SCM demand since the trial-and-error characteristic of RL is time-consuming in practice. By surveying an efficient policy transition mechanism in RL about how to mapping existing policies in the previous task to a new policies in a changed task, this paper proposes a novel RL agent based SCM system that decreases learning time of the RL agent to a dynamic environment. As the result, the RL agent derives the maximal profit using RL technique as jobs coming with a stable distribution. Further, the RL agent makes the optimal procurement satisfying the requirement of sudden changes in the supply chain network by the policy transition mechanism.