{"title":"批量RL数据集的特征分析工具","authors":"Ruiyang Xu, Zhengxing Chen","doi":"10.1145/3442442.3453147","DOIUrl":null,"url":null,"abstract":"Batch RL is concerned about learning a decision policy from a given dataset without interacting with the environment. Although research is actively conducted on learning-related issues (e.g., convergence speed, stability, and safety), empirical challenges before learning are largely ignored. Many RL practitioners face the challenge of determining whether a designed Markov Decision Process (MDP) is valid and meaningful. This study proposes a model-based method to check whether an MDP designed for a given dataset is well formulated through a heuristic-based feature analysis. We tested our method in constructed as well as more realistic environments. Our results show that our approach can identify potential problems of data. As far as we know, performing validity analysis on batch RL data is a novel direction, and we envision that our tool serves as a motivational example to help practitioners apply RL more easily.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Feature Analysis Tool for Batch RL Datasets\",\"authors\":\"Ruiyang Xu, Zhengxing Chen\",\"doi\":\"10.1145/3442442.3453147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Batch RL is concerned about learning a decision policy from a given dataset without interacting with the environment. Although research is actively conducted on learning-related issues (e.g., convergence speed, stability, and safety), empirical challenges before learning are largely ignored. Many RL practitioners face the challenge of determining whether a designed Markov Decision Process (MDP) is valid and meaningful. This study proposes a model-based method to check whether an MDP designed for a given dataset is well formulated through a heuristic-based feature analysis. We tested our method in constructed as well as more realistic environments. Our results show that our approach can identify potential problems of data. As far as we know, performing validity analysis on batch RL data is a novel direction, and we envision that our tool serves as a motivational example to help practitioners apply RL more easily.\",\"PeriodicalId\":129420,\"journal\":{\"name\":\"Companion Proceedings of the Web Conference 2021\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion Proceedings of the Web Conference 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3442442.3453147\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442442.3453147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Batch RL is concerned about learning a decision policy from a given dataset without interacting with the environment. Although research is actively conducted on learning-related issues (e.g., convergence speed, stability, and safety), empirical challenges before learning are largely ignored. Many RL practitioners face the challenge of determining whether a designed Markov Decision Process (MDP) is valid and meaningful. This study proposes a model-based method to check whether an MDP designed for a given dataset is well formulated through a heuristic-based feature analysis. We tested our method in constructed as well as more realistic environments. Our results show that our approach can identify potential problems of data. As far as we know, performing validity analysis on batch RL data is a novel direction, and we envision that our tool serves as a motivational example to help practitioners apply RL more easily.