{"title":"A Sequential Test for Selecting the Better Variant: Online A/B testing, Adaptive Allocation, and Continuous Monitoring","authors":"Nianqiao Ju, D. Hu, Adam Henderson, Liangjie Hong","doi":"10.1145/3289600.3291025","DOIUrl":null,"url":null,"abstract":"Online A/B tests play an instrumental role for Internet companies to improve products and technologies in a data-driven manner. An online A/B test, in its most straightforward form, can be treated as a static hypothesis test where traditional statistical tools such as p-values and power analysis might be applied to help decision makers determine which variant performs better. However, a static A/B test presents both time cost and the opportunity cost for rapid product iterations. For time cost, a fast-paced product evolution pushes its shareholders to consistently monitor results from online A/B experiments, which usually invites peeking and altering experimental designs as data collected. It is recognized that this flexibility might harm statistical guarantees if not introduced in the right way, especially when online tests are considered as static hypothesis tests. For opportunity cost, a static test usually entails a static allocation of users into different variants, which prevents an immediate roll-out of the better version to larger audience or risks of alienating users who may suffer from a bad experience. While some works try to tackle these challenges, no prior method focuses on a holistic solution to both issues. In this paper, we propose a unified framework utilizing sequential analysis and multi-armed bandit to address time cost and the opportunity cost of static online tests simultaneously. In particular, we present an imputed sequential Girshick test that accommodates online data and dynamic allocation of data. The unobserved potential outcomes are treated as missing data and are imputed using empirical averages. Focusing on the binomial model, we demonstrate that the proposed imputed Girshick test achieves Type-I error and power control with both a fixed allocation ratio and an adaptive allocation such as Thompson Sampling through extensive experiments. In addition, we also run experiments on historical Etsy.com A/B tests to show the reduction in opportunity cost when using the proposed method.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3289600.3291025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
Online A/B tests play an instrumental role for Internet companies to improve products and technologies in a data-driven manner. An online A/B test, in its most straightforward form, can be treated as a static hypothesis test where traditional statistical tools such as p-values and power analysis might be applied to help decision makers determine which variant performs better. However, a static A/B test presents both time cost and the opportunity cost for rapid product iterations. For time cost, a fast-paced product evolution pushes its shareholders to consistently monitor results from online A/B experiments, which usually invites peeking and altering experimental designs as data collected. It is recognized that this flexibility might harm statistical guarantees if not introduced in the right way, especially when online tests are considered as static hypothesis tests. For opportunity cost, a static test usually entails a static allocation of users into different variants, which prevents an immediate roll-out of the better version to larger audience or risks of alienating users who may suffer from a bad experience. While some works try to tackle these challenges, no prior method focuses on a holistic solution to both issues. In this paper, we propose a unified framework utilizing sequential analysis and multi-armed bandit to address time cost and the opportunity cost of static online tests simultaneously. In particular, we present an imputed sequential Girshick test that accommodates online data and dynamic allocation of data. The unobserved potential outcomes are treated as missing data and are imputed using empirical averages. Focusing on the binomial model, we demonstrate that the proposed imputed Girshick test achieves Type-I error and power control with both a fixed allocation ratio and an adaptive allocation such as Thompson Sampling through extensive experiments. In addition, we also run experiments on historical Etsy.com A/B tests to show the reduction in opportunity cost when using the proposed method.