{"title":"一种统一离线因果推理和在线多臂强盗学习的鲁棒算法","authors":"Qiaoqiao Tang, Hong Xie","doi":"10.1109/ICDM51629.2021.00071","DOIUrl":null,"url":null,"abstract":"Utilizing offline logged data to improve sequential or online decision making is drawing more and more attention. VirUCB is one of the latest notable algorithmic framework in this research line, and it has both sound theoretical guarantee and nice empirical performance. However, regarding VirUCB, it is still unclear: (1) how imbalanced offline logged data influences the decision making accuracy; (2) how to schedule offline logged data across the decision making horizon so as to reduce offline logged data consumption. We show that with imbalanced offline logged data, VirUCB can have a learning speed slower than the baseline algorithm without offline logged data. This finding inspires us to design RobVirUCB algorithm, which is robust against such imbalanced data, i.e., still maintains a fast learning speed. RobVirUCB adaptively selects “useful” offline logged data to speed up learning and it has theoretical guarantees on regret. Finally, we design EffVirUCB algorithm, which reduces offline logged data consumption of RobVirUCB. EffVirUCB schedules the offline logged data to the decision round that the decision maker may select suboptimal arms and it has theoretical guarantees on regret. Extensive experiments on both synthetic data and real-world data validate the superior performance of RobVirUCB and EffVirUCB.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Robust Algorithm to Unifying Offline Causal Inference and Online Multi-armed Bandit Learning\",\"authors\":\"Qiaoqiao Tang, Hong Xie\",\"doi\":\"10.1109/ICDM51629.2021.00071\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Utilizing offline logged data to improve sequential or online decision making is drawing more and more attention. VirUCB is one of the latest notable algorithmic framework in this research line, and it has both sound theoretical guarantee and nice empirical performance. However, regarding VirUCB, it is still unclear: (1) how imbalanced offline logged data influences the decision making accuracy; (2) how to schedule offline logged data across the decision making horizon so as to reduce offline logged data consumption. We show that with imbalanced offline logged data, VirUCB can have a learning speed slower than the baseline algorithm without offline logged data. This finding inspires us to design RobVirUCB algorithm, which is robust against such imbalanced data, i.e., still maintains a fast learning speed. RobVirUCB adaptively selects “useful” offline logged data to speed up learning and it has theoretical guarantees on regret. Finally, we design EffVirUCB algorithm, which reduces offline logged data consumption of RobVirUCB. EffVirUCB schedules the offline logged data to the decision round that the decision maker may select suboptimal arms and it has theoretical guarantees on regret. Extensive experiments on both synthetic data and real-world data validate the superior performance of RobVirUCB and EffVirUCB.\",\"PeriodicalId\":320970,\"journal\":{\"name\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM51629.2021.00071\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM51629.2021.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Robust Algorithm to Unifying Offline Causal Inference and Online Multi-armed Bandit Learning
Utilizing offline logged data to improve sequential or online decision making is drawing more and more attention. VirUCB is one of the latest notable algorithmic framework in this research line, and it has both sound theoretical guarantee and nice empirical performance. However, regarding VirUCB, it is still unclear: (1) how imbalanced offline logged data influences the decision making accuracy; (2) how to schedule offline logged data across the decision making horizon so as to reduce offline logged data consumption. We show that with imbalanced offline logged data, VirUCB can have a learning speed slower than the baseline algorithm without offline logged data. This finding inspires us to design RobVirUCB algorithm, which is robust against such imbalanced data, i.e., still maintains a fast learning speed. RobVirUCB adaptively selects “useful” offline logged data to speed up learning and it has theoretical guarantees on regret. Finally, we design EffVirUCB algorithm, which reduces offline logged data consumption of RobVirUCB. EffVirUCB schedules the offline logged data to the decision round that the decision maker may select suboptimal arms and it has theoretical guarantees on regret. Extensive experiments on both synthetic data and real-world data validate the superior performance of RobVirUCB and EffVirUCB.