A Sequential Test for Selecting the Better Variant: Online A/B testing, Adaptive Allocation, and Continuous Monitoring

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining Pub Date : 2019-01-30 DOI:10.1145/3289600.3291025

Nianqiao Ju, D. Hu, Adam Henderson, Liangjie Hong

{"title":"A Sequential Test for Selecting the Better Variant: Online A/B testing, Adaptive Allocation, and Continuous Monitoring","authors":"Nianqiao Ju, D. Hu, Adam Henderson, Liangjie Hong","doi":"10.1145/3289600.3291025","DOIUrl":null,"url":null,"abstract":"Online A/B tests play an instrumental role for Internet companies to improve products and technologies in a data-driven manner. An online A/B test, in its most straightforward form, can be treated as a static hypothesis test where traditional statistical tools such as p-values and power analysis might be applied to help decision makers determine which variant performs better. However, a static A/B test presents both time cost and the opportunity cost for rapid product iterations. For time cost, a fast-paced product evolution pushes its shareholders to consistently monitor results from online A/B experiments, which usually invites peeking and altering experimental designs as data collected. It is recognized that this flexibility might harm statistical guarantees if not introduced in the right way, especially when online tests are considered as static hypothesis tests. For opportunity cost, a static test usually entails a static allocation of users into different variants, which prevents an immediate roll-out of the better version to larger audience or risks of alienating users who may suffer from a bad experience. While some works try to tackle these challenges, no prior method focuses on a holistic solution to both issues. In this paper, we propose a unified framework utilizing sequential analysis and multi-armed bandit to address time cost and the opportunity cost of static online tests simultaneously. In particular, we present an imputed sequential Girshick test that accommodates online data and dynamic allocation of data. The unobserved potential outcomes are treated as missing data and are imputed using empirical averages. Focusing on the binomial model, we demonstrate that the proposed imputed Girshick test achieves Type-I error and power control with both a fixed allocation ratio and an adaptive allocation such as Thompson Sampling through extensive experiments. In addition, we also run experiments on historical Etsy.com A/B tests to show the reduction in opportunity cost when using the proposed method.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3289600.3291025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

Abstract

Online A/B tests play an instrumental role for Internet companies to improve products and technologies in a data-driven manner. An online A/B test, in its most straightforward form, can be treated as a static hypothesis test where traditional statistical tools such as p-values and power analysis might be applied to help decision makers determine which variant performs better. However, a static A/B test presents both time cost and the opportunity cost for rapid product iterations. For time cost, a fast-paced product evolution pushes its shareholders to consistently monitor results from online A/B experiments, which usually invites peeking and altering experimental designs as data collected. It is recognized that this flexibility might harm statistical guarantees if not introduced in the right way, especially when online tests are considered as static hypothesis tests. For opportunity cost, a static test usually entails a static allocation of users into different variants, which prevents an immediate roll-out of the better version to larger audience or risks of alienating users who may suffer from a bad experience. While some works try to tackle these challenges, no prior method focuses on a holistic solution to both issues. In this paper, we propose a unified framework utilizing sequential analysis and multi-armed bandit to address time cost and the opportunity cost of static online tests simultaneously. In particular, we present an imputed sequential Girshick test that accommodates online data and dynamic allocation of data. The unobserved potential outcomes are treated as missing data and are imputed using empirical averages. Focusing on the binomial model, we demonstrate that the proposed imputed Girshick test achieves Type-I error and power control with both a fixed allocation ratio and an adaptive allocation such as Thompson Sampling through extensive experiments. In addition, we also run experiments on historical Etsy.com A/B tests to show the reduction in opportunity cost when using the proposed method.

查看原文本刊更多论文

选择更好的变体的顺序测试:在线A/B测试，自适应分配和持续监控

在线A/B测试在互联网公司以数据驱动的方式改进产品和技术方面发挥着重要作用。一个在线的A/B测试，在其最直接的形式下，可以被视为一个静态假设测试，其中传统的统计工具，如p值和功率分析，可以帮助决策者确定哪个变体表现更好。然而，静态的a /B测试对于快速的产品迭代会带来时间成本和机会成本。在时间成本方面，快节奏的产品发展迫使其股东持续监控在线a /B实验的结果，这通常会引起窥视和改变实验设计。人们认识到，如果不以正确的方式引入这种灵活性，特别是当在线测试被视为静态假设测试时，这种灵活性可能会损害统计保证。就机会成本而言，静态测试通常需要将用户静态地分配到不同的变体中，这就阻止了将更好的版本立即推出给更多的用户，或者冒着疏远可能遭受糟糕体验的用户的风险。虽然有些作品试图解决这些挑战，但之前没有一种方法专注于解决这两个问题的整体解决方案。在本文中，我们提出了一个统一的框架，利用顺序分析和多臂强盗来同时解决静态在线测试的时间成本和机会成本。特别地，我们提出了一种适应在线数据和数据动态分配的输入顺序Girshick检验。未观察到的潜在结果被视为缺失数据，并使用经验平均值进行估算。针对二项模型，我们通过大量的实验证明了所提出的输入Girshick检验在固定分配比例和自适应分配(如Thompson Sampling)下都实现了i型误差和功率控制。此外，我们还对Etsy.com的历史A/B测试进行了实验，以证明使用所提出的方法可以降低机会成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量