Nonparametric Two-Sample Testing by Betting

IF 2.2 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Information Theory Pub Date : 2023-08-16 DOI:10.1109/TIT.2023.3305867

Shubhanshu Shekhar;Aaditya Ramdas

{"title":"Nonparametric Two-Sample Testing by Betting","authors":"Shubhanshu Shekhar;Aaditya Ramdas","doi":"10.1109/TIT.2023.3305867","DOIUrl":null,"url":null,"abstract":"We study the problem of designing consistent sequential two-sample tests in a nonparametric setting. Guided by the principle of testing by betting, we reframe this task into that of selecting a sequence of payoff functions that maximize the wealth of a fictitious bettor, betting against the null in a repeated game. In this setting, the relative increase in the bettor’s wealth has a precise interpretation as the measure of evidence against the null, and thus our sequential test rejects the null when the wealth crosses an appropriate threshold. We develop a general framework for setting up the betting game for two-sample testing, in which the payoffs are selected by a prediction strategy as data-driven predictable estimates of the witness function associated with the variational representation of some statistical distance measures, such as integral probability metrics (IPMs). We then formally relate the statistical properties of the test (such as consistency, type-II error exponent and expected sample size) to the regret of the corresponding prediction strategy. We construct a practical sequential two-sample test by instantiating our general strategy with the kernel-MMD metric, and demonstrate its ability to adapt to the difficulty of the unknown alternative through theoretical and empirical results. Our framework is versatile, and easily extends to other problems; we illustrate this by applying our approach to construct consistent tests for the following problems: (i) time-varying two-sample testing with non-exchangeable observations, and (ii) an abstract class of “invariant” testing problems, including symmetry and independence testing.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"70 2","pages":"1178-1203"},"PeriodicalIF":2.2000,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10220229/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

We study the problem of designing consistent sequential two-sample tests in a nonparametric setting. Guided by the principle of testing by betting, we reframe this task into that of selecting a sequence of payoff functions that maximize the wealth of a fictitious bettor, betting against the null in a repeated game. In this setting, the relative increase in the bettor’s wealth has a precise interpretation as the measure of evidence against the null, and thus our sequential test rejects the null when the wealth crosses an appropriate threshold. We develop a general framework for setting up the betting game for two-sample testing, in which the payoffs are selected by a prediction strategy as data-driven predictable estimates of the witness function associated with the variational representation of some statistical distance measures, such as integral probability metrics (IPMs). We then formally relate the statistical properties of the test (such as consistency, type-II error exponent and expected sample size) to the regret of the corresponding prediction strategy. We construct a practical sequential two-sample test by instantiating our general strategy with the kernel-MMD metric, and demonstrate its ability to adapt to the difficulty of the unknown alternative through theoretical and empirical results. Our framework is versatile, and easily extends to other problems; we illustrate this by applying our approach to construct consistent tests for the following problems: (i) time-varying two-sample testing with non-exchangeable observations, and (ii) an abstract class of “invariant” testing problems, including symmetry and independence testing.

查看原文本刊更多论文

非参数双样本投注检验

我们研究了在非参数环境下设计一致序列两样本检验的问题。在博彩测试原则的指导下，我们将这项任务重新定义为选择一系列回报函数，使虚构的投注者的财富最大化，在重复的游戏中反对零。在这种情况下，投注者财富的相对增加可以准确地解释为反对零的证据，因此，当财富超过适当的阈值时，我们的顺序测试会拒绝零。我们开发了一个通用框架，用于设置两个样本测试的博彩游戏，其中通过预测策略选择收益，作为与一些统计距离度量（如积分概率度量（IPMs））的变分表示相关的见证函数的数据驱动的可预测估计。然后，我们将测试的统计特性（如一致性、II型误差指数和预期样本量）与相应预测策略的遗憾正式联系起来。我们通过用核MMD度量实例化我们的一般策略，构建了一个实用的连续两样本测试，并通过理论和实证结果证明了它适应未知备选方案难度的能力。我们的框架是通用的，很容易扩展到其他问题；我们通过应用我们的方法来构造以下问题的一致性检验来说明这一点：（i）具有不可交换观测的时变双样本检验，以及（ii）一类抽象的“不变”检验问题，包括对称性和独立性检验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Theory 工程技术-工程：电子与电气

CiteScore

5.70

自引率

20.00%

发文量

514

审稿时长

12 months

期刊介绍： The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.