Muhammad Moiz, Hazique Malik, Muhammad Bilal, Noman Naseer
{"title":"$Q_{biased}$ Softmax回归算法的多偏置技术比较分析","authors":"Muhammad Moiz, Hazique Malik, Muhammad Bilal, Noman Naseer","doi":"10.1109/AIMS52415.2021.9466049","DOIUrl":null,"url":null,"abstract":"Over the past many years the popularity of robotic workers has seen a tremendous surge. Several tasks which were previously considered insurmountable are able to be performed by robots efficiently, with much ease. This is mainly due to the advances made in the field of control systems and artificial intelligence in recent years. Lately, we have seen Reinforcement Learning (RL) capture the spotlight, in the field of robotics. Instead of explicitly specifying the solution of a particular task, RL enables the robot (agent) to explore its environment and through trial and error choose the appropriate response. In this paper, a comparative analysis of biasing techniques for the Q-biased softmax regression (QBIASSR) algorithm has been presented. In QBIASSR, decision-making for un-explored states depends upon the set of previously explored states. This algorithm improves the learning process when the robot reaches unexplored states. A vector bias(s) is calculated on the basis of variable values of experienced states and added to the Q-value function for action selection. To obtain the optimized reward, different techniques to calculate bias(s) are adopted. The performance of all the techniques has been evaluated and compared for obstacle avoidance in the case of a mobile robot. In the end, we have demonstrated that the cumulative reward generated by the technique proposed in our paper is at least 2 times greater than the baseline.","PeriodicalId":299121,"journal":{"name":"2021 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Analysis of Multiple Biasing Techniques for $Q_{biased}$ Softmax Regression Algorithm\",\"authors\":\"Muhammad Moiz, Hazique Malik, Muhammad Bilal, Noman Naseer\",\"doi\":\"10.1109/AIMS52415.2021.9466049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the past many years the popularity of robotic workers has seen a tremendous surge. Several tasks which were previously considered insurmountable are able to be performed by robots efficiently, with much ease. This is mainly due to the advances made in the field of control systems and artificial intelligence in recent years. Lately, we have seen Reinforcement Learning (RL) capture the spotlight, in the field of robotics. Instead of explicitly specifying the solution of a particular task, RL enables the robot (agent) to explore its environment and through trial and error choose the appropriate response. In this paper, a comparative analysis of biasing techniques for the Q-biased softmax regression (QBIASSR) algorithm has been presented. In QBIASSR, decision-making for un-explored states depends upon the set of previously explored states. This algorithm improves the learning process when the robot reaches unexplored states. A vector bias(s) is calculated on the basis of variable values of experienced states and added to the Q-value function for action selection. To obtain the optimized reward, different techniques to calculate bias(s) are adopted. The performance of all the techniques has been evaluated and compared for obstacle avoidance in the case of a mobile robot. In the end, we have demonstrated that the cumulative reward generated by the technique proposed in our paper is at least 2 times greater than the baseline.\",\"PeriodicalId\":299121,\"journal\":{\"name\":\"2021 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AIMS52415.2021.9466049\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIMS52415.2021.9466049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Analysis of Multiple Biasing Techniques for $Q_{biased}$ Softmax Regression Algorithm
Over the past many years the popularity of robotic workers has seen a tremendous surge. Several tasks which were previously considered insurmountable are able to be performed by robots efficiently, with much ease. This is mainly due to the advances made in the field of control systems and artificial intelligence in recent years. Lately, we have seen Reinforcement Learning (RL) capture the spotlight, in the field of robotics. Instead of explicitly specifying the solution of a particular task, RL enables the robot (agent) to explore its environment and through trial and error choose the appropriate response. In this paper, a comparative analysis of biasing techniques for the Q-biased softmax regression (QBIASSR) algorithm has been presented. In QBIASSR, decision-making for un-explored states depends upon the set of previously explored states. This algorithm improves the learning process when the robot reaches unexplored states. A vector bias(s) is calculated on the basis of variable values of experienced states and added to the Q-value function for action selection. To obtain the optimized reward, different techniques to calculate bias(s) are adopted. The performance of all the techniques has been evaluated and compared for obstacle avoidance in the case of a mobile robot. In the end, we have demonstrated that the cumulative reward generated by the technique proposed in our paper is at least 2 times greater than the baseline.