{"title":"Asymptotic Theory of the Best-Choice Rerandomization using the Mahalanobis Distance","authors":"Yuhao Wang, Xinran Li","doi":"arxiv-2312.02513","DOIUrl":null,"url":null,"abstract":"Rerandomization, a design that utilizes pretreatment covariates and improves\ntheir balance between different treatment groups, has received attention\nrecently in both theory and practice. There are at least two types of\nrerandomization that are used in practice: the first rerandomizes the treatment\nassignment until covariate imbalance is below a prespecified threshold; the\nsecond randomizes the treatment assignment multiple times and chooses the one\nwith the best covariate balance. In this paper we will consider the second type\nof rerandomization, namely the best-choice rerandomization, whose theory and\ninference are still lacking in the literature. In particular, we will focus on\nthe best-choice rerandomization that uses the Mahalanobis distance to measure\ncovariate imbalance, which is one of the most commonly used imbalance measure\nfor multivariate covariates and is invariant to affine transformations of\ncovariates. We will study the large-sample repeatedly sampling properties of\nthe best-choice rerandomization, allowing both the number of covariates and the\nnumber of tried complete randomizations to increase with the sample size. We\nshow that the asymptotic distribution of the difference-in-means estimator is\nmore concentrated around the true average treatment effect under\nrerandomization than under the complete randomization, and propose large-sample\naccurate confidence intervals for rerandomization that are shorter than that\nfor the completely randomized experiment. We further demonstrate that, with\nmoderate number of covariates and with the number of tried randomizations\nincreasing polynomially with the sample size, the best-choice rerandomization\ncan achieve the ideally optimal precision that one can expect even with\nperfectly balanced covariates. The developed theory and methods for\nrerandomization are also illustrated using real field experiments.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"84 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.02513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Rerandomization, a design that utilizes pretreatment covariates and improves
their balance between different treatment groups, has received attention
recently in both theory and practice. There are at least two types of
rerandomization that are used in practice: the first rerandomizes the treatment
assignment until covariate imbalance is below a prespecified threshold; the
second randomizes the treatment assignment multiple times and chooses the one
with the best covariate balance. In this paper we will consider the second type
of rerandomization, namely the best-choice rerandomization, whose theory and
inference are still lacking in the literature. In particular, we will focus on
the best-choice rerandomization that uses the Mahalanobis distance to measure
covariate imbalance, which is one of the most commonly used imbalance measure
for multivariate covariates and is invariant to affine transformations of
covariates. We will study the large-sample repeatedly sampling properties of
the best-choice rerandomization, allowing both the number of covariates and the
number of tried complete randomizations to increase with the sample size. We
show that the asymptotic distribution of the difference-in-means estimator is
more concentrated around the true average treatment effect under
rerandomization than under the complete randomization, and propose large-sample
accurate confidence intervals for rerandomization that are shorter than that
for the completely randomized experiment. We further demonstrate that, with
moderate number of covariates and with the number of tried randomizations
increasing polynomially with the sample size, the best-choice rerandomization
can achieve the ideally optimal precision that one can expect even with
perfectly balanced covariates. The developed theory and methods for
rerandomization are also illustrated using real field experiments.