{"title":"电梯控制的顺序决策","authors":"Emre Oner Tartan, Cebrail Ciflikli","doi":"10.12720/jait.14.5.1124-1131","DOIUrl":null,"url":null,"abstract":"—In the last decade Reinforcement Learning (RL) has significantly changed the conventional control paradigm in many fields. RL approach is spreading with many applications such as autonomous driving and industry automation. Markov Decision Process (MDP) forms a mathematical idealized basis for RL if the explicit model is available. Dynamic programming allows to find an optimal policy for sequential decision making in a MDP. In this study we consider the elevator control as a sequential decision making problem, describe it as a MDP with finite state space and solve it using dynamic programming. At each decision making time step we aim to take the optimal action to minimize the total of hall call waiting times in the episodic task. We consider a sample 6-floor building and simulate the proposed method in comparison with the conventional Nearest Car Method (NCM).","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sequential Decision Making for Elevator Control\",\"authors\":\"Emre Oner Tartan, Cebrail Ciflikli\",\"doi\":\"10.12720/jait.14.5.1124-1131\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"—In the last decade Reinforcement Learning (RL) has significantly changed the conventional control paradigm in many fields. RL approach is spreading with many applications such as autonomous driving and industry automation. Markov Decision Process (MDP) forms a mathematical idealized basis for RL if the explicit model is available. Dynamic programming allows to find an optimal policy for sequential decision making in a MDP. In this study we consider the elevator control as a sequential decision making problem, describe it as a MDP with finite state space and solve it using dynamic programming. At each decision making time step we aim to take the optimal action to minimize the total of hall call waiting times in the episodic task. We consider a sample 6-floor building and simulate the proposed method in comparison with the conventional Nearest Car Method (NCM).\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12720/jait.14.5.1124-1131\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12720/jait.14.5.1124-1131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
—In the last decade Reinforcement Learning (RL) has significantly changed the conventional control paradigm in many fields. RL approach is spreading with many applications such as autonomous driving and industry automation. Markov Decision Process (MDP) forms a mathematical idealized basis for RL if the explicit model is available. Dynamic programming allows to find an optimal policy for sequential decision making in a MDP. In this study we consider the elevator control as a sequential decision making problem, describe it as a MDP with finite state space and solve it using dynamic programming. At each decision making time step we aim to take the optimal action to minimize the total of hall call waiting times in the episodic task. We consider a sample 6-floor building and simulate the proposed method in comparison with the conventional Nearest Car Method (NCM).