{"title":"REINFORCEMENT LEARNING FOR INDIVIDUAL OPTIMAL POLICY FROM HETEROGENEOUS DATA.","authors":"By Rui Miao, Babak Shahbaba, Annie Qu","doi":"10.1214/25-aos2512","DOIUrl":"10.1214/25-aos2512","url":null,"abstract":"<p><p>Offline reinforcement learning (RL) aims to find optimal policies in dynamic environments in order to maximize the expected total rewards by leveraging pre-collected data. Learning from heterogeneous data is one of the fundamental challenges in offline RL. Traditional methods focus on learning an optimal policy for all individuals with pre-collected data from a single episode or homogeneous batch episodes, and thus, may result in a suboptimal policy for a heterogeneous population. In this paper, we propose an individualized offline policy optimization framework for heterogeneous time-stationary Markov decision processes (MDPs). The proposed heterogeneous model with individual latent variables enables us to efficiently estimate the individual Q-functions, and our Penalized Pessimistic Personalized Policy Learning (P4L) algorithm guarantees a fast rate on the average regret under a weak partial coverage assumption on behavior policies. In addition, our simulation studies and a real data application demonstrate the superior numerical performance of the proposed method compared with existing methods.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"53 4","pages":"1513-1534"},"PeriodicalIF":3.7,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12439830/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145079494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2025-02-01Epub Date: 2025-02-13DOI: 10.1214/24-aos2457
Satarupa Bhattacharjee, Bing Li, Lingzhou Xue
{"title":"NONLINEAR GLOBAL FRÉCHET REGRESSION FOR RANDOM OBJECTS VIA WEAK CONDITIONAL EXPECTATION.","authors":"Satarupa Bhattacharjee, Bing Li, Lingzhou Xue","doi":"10.1214/24-aos2457","DOIUrl":"10.1214/24-aos2457","url":null,"abstract":"<p><p>Random objects are complex non-Euclidean data taking values in general metric spaces, possibly devoid of any underlying vector space structure. Such data are becoming increasingly abundant with the rapid advancement in technology. Examples include probability distributions, positive semidefinite matrices and data on Riemannian manifolds. However, except for regression for object-valued response with Euclidean predictors and distribution-on-distribution regression, there has been limited development of a general framework for object-valued response with object-valued predictors in the literature. To fill this gap, we introduce the notion of a weak conditional Fréchet mean based on Carleman operators and then propose a global nonlinear Fréchet regression model through the reproducing kernel Hilbert space (RKHS) embedding. Furthermore, we establish the relationships between the conditional Fréchet mean and the weak conditional Fréchet mean for both Euclidean and object-valued data. We also show that the state-of-the-art global Fréchet regression developed by Petersen and Müller (<i>Ann</i>. <i>Statist</i>. <b>47</b> (2019) 691-719) emerges as a special case of our method by choosing a linear kernel. We require that the metric space for the predictor admits a reproducing kernel, while the intrinsic geometry of the metric space for the response is utilized to study the asymptotic properties of the proposed estimates. Numerical studies, including extensive simulations and a real application, are conducted to investigate the finite-sample performance.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"53 1","pages":"117-143"},"PeriodicalIF":3.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12407180/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144999566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules.","authors":"Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J Su","doi":"10.1214/24-aos2468","DOIUrl":"10.1214/24-aos2468","url":null,"abstract":"<p><p>Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key-provided by the LLM to the verifier-to control the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks-one of which has been internally implemented at OpenAI-and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"53 1","pages":"322-351"},"PeriodicalIF":3.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12467635/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145184574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2024-06-01Epub Date: 2024-08-11DOI: 10.1214/24-aos2378
Bingxin Zhao, Shurong Zheng, Hongtu Zhu
{"title":"ON BLOCKWISE AND REFERENCE PANEL-BASED ESTIMATORS FOR GENETIC DATA PREDICTION IN HIGH DIMENSIONS.","authors":"Bingxin Zhao, Shurong Zheng, Hongtu Zhu","doi":"10.1214/24-aos2378","DOIUrl":"10.1214/24-aos2378","url":null,"abstract":"<p><p>Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training data set. This paper presents a unified analysis of blockwise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, blockwise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training data set and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training data set. This analysis is based on novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and real data analysis in the UK Biobank.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"52 3","pages":"948-965"},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11391480/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142279682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2024-04-01Epub Date: 2024-05-09DOI: 10.1214/24-aos2369
Edward H Kennedy, Sivaraman Balakrishnan, James M Robins, Larry Wasserman
{"title":"Minimax rates for heterogeneous causal effect estimation.","authors":"Edward H Kennedy, Sivaraman Balakrishnan, James M Robins, Larry Wasserman","doi":"10.1214/24-aos2369","DOIUrl":"10.1214/24-aos2369","url":null,"abstract":"<p><p>Estimation of heterogeneous causal effects - i.e., how effects of policies and treatments vary across subjects - is a fundamental task in causal inference. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but questions surrounding optimality have remained largely unanswered. In particular, a minimax theory of optimality has yet to be developed, with the minimax rate of convergence and construction of rate-optimal estimators remaining open problems. In this paper we derive the minimax rate for CATE estimation, in a Hölder-smooth nonparametric model, and present a new local polynomial estimator, giving high-level conditions under which it is minimax optimal. Our minimax lower bound is derived via a localized version of the method of fuzzy hypotheses, combining lower bound constructions for nonparametric regression and functional estimation. Our proposed estimator can be viewed as a local polynomial R-Learner, based on a localized modification of higher-order influence function methods. The minimax rate we find exhibits several interesting features, including a non-standard elbow phenomenon and an unusual interpolation between nonparametric regression and functional estimation rates. The latter quantifies how the CATE, as an estimand, can be viewed as a regression/functional hybrid.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"52 2","pages":"793-816"},"PeriodicalIF":3.2,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11960818/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143762600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2024-02-01Epub Date: 2024-03-07DOI: 10.1214/23-aos2339
Yeqing Zhou, Kai Xu, Liping Zhu, Runze Li
{"title":"RANK-BASED INDICES FOR TESTING INDEPENDENCE BETWEEN TWO HIGH-DIMENSIONAL VECTORS.","authors":"Yeqing Zhou, Kai Xu, Liping Zhu, Runze Li","doi":"10.1214/23-aos2339","DOIUrl":"10.1214/23-aos2339","url":null,"abstract":"<p><p>To test independence between two high-dimensional random vectors, we propose three tests based on the rank-based indices derived from Hoeffding's <math><mi>D</mi></math>, Blum-Kiefer-Rosenblatt's <math><mi>R</mi></math> and Bergsma-Dassios-Yanagimoto's <math><msup><mrow><mi>τ</mi></mrow><mrow><mo>*</mo></mrow></msup></math>. Under the null hypothesis of independence, we show that the distributions of the proposed test statistics converge to normal ones if the dimensions diverge arbitrarily with the sample size. We further derive an explicit rate of convergence. Thanks to the monotone transformation-invariant property, these distribution-free tests can be readily used to generally distributed random vectors including heavily tailed ones. We further study the local power of the proposed tests and compare their relative efficiencies with two classic distance covariance/correlation based tests in high dimensional settings. We establish explicit relationships between <math><mi>D</mi><mo>,</mo><mi>R</mi><mo>,</mo><msup><mrow><mi>τ</mi></mrow><mrow><mo>*</mo></mrow></msup></math> and Pearson's correlation for bivariate normal random variables. The relationships serve as a basis for power comparison. Our theoretical results show that under a Gaussian equicorrelation alternative, (i) the proposed tests are superior to the two classic distance covariance/correlation based tests if the components of random vectors have very different scales; (ii) the asymptotic efficiency of the proposed tests based on <math><mi>D</mi><mo>,</mo><msup><mrow><mi>τ</mi></mrow><mrow><mo>*</mo></mrow></msup></math> and <math><mi>R</mi></math> are sorted in a descending order.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"52 1","pages":"184-206"},"PeriodicalIF":3.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11064990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140849012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2024-02-01Epub Date: 2024-03-07DOI: 10.1214/23-aos2347
Wen Wang, Shihao Wu, Ziwei Zhu, Ling Zhou, Peter X-K Song
{"title":"SUPERVISED HOMOGENEITY FUSION: A COMBINATORIAL APPROACH.","authors":"Wen Wang, Shihao Wu, Ziwei Zhu, Ling Zhou, Peter X-K Song","doi":"10.1214/23-aos2347","DOIUrl":"10.1214/23-aos2347","url":null,"abstract":"<p><p>Fusing regression coefficients into homogeneous groups can unveil those coefficients that share a common value within each group. Such groupwise homogeneity reduces the intrinsic dimension of the parameter space and unleashes sharper statistical accuracy. We propose and investigate a new combinatorial grouping approach called <math> <msub><mrow><mi>L</mi></mrow> <mrow><mn>0</mn></mrow> </msub> </math> -Fusion that is amenable to mixed integer optimization (MIO). On the statistical aspect, we identify a fundamental quantity called <i>MSE grouping sensitivity</i> that underpins the difficulty of recovering the true groups. We show that <math> <msub><mrow><mi>L</mi></mrow> <mrow><mn>0</mn></mrow> </msub> </math> -Fusion achieves grouping consistency under the weakest possible requirement of the grouping sensitivity: if this requirement is violated, then the minimax risk of group misspecification will fail to converge to zero. Moreover, we show that in the high-dimensional regime, one can apply <math> <msub><mrow><mi>L</mi></mrow> <mrow><mn>0</mn></mrow> </msub> </math> -Fusion with a sure screening set of features without any essential loss of statistical efficiency, while reducing the computational cost substantially. On the algorithmic aspect, we provide an MIO formulation for <math> <msub><mrow><mi>L</mi></mrow> <mrow><mn>0</mn></mrow> </msub> </math> -Fusion along with a warm start strategy. Simulation and real data analysis demonstrate that <math> <msub><mrow><mi>L</mi></mrow> <mrow><mn>0</mn></mrow> </msub> </math> -Fusion exhibits superiority over its competitors in terms of grouping accuracy.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"52 1","pages":"285-310"},"PeriodicalIF":3.7,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12327361/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144793305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Order-of-addition orthogonal arrays to study the effect of treatment ordering","authors":"Eric D. Schoen, Robert W. Mee","doi":"10.1214/23-aos2317","DOIUrl":"https://doi.org/10.1214/23-aos2317","url":null,"abstract":"The effect of the order in which a set of m treatments is applied can be modeled by relative-position factors that indicate whether treatment i is carried out before or after treatment j, or by the absolute position for treatment i in the sequence. A design with the same normalized information matrix as the design with all m! sequences is D- and G-optimal for the main-effects model involving the relative-position factors. We prove that such designs are also I-optimal for this model and D-optimal as well as G- and I-optimal for the first-order model in the absolute-position factors. We propose a methodology for a complete or partial enumeration of nonequivalent designs that are optimal for both models.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135055038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matching recovery threshold for correlated random graphs","authors":"Jian Ding, Hang Du","doi":"10.1214/23-aos2305","DOIUrl":"https://doi.org/10.1214/23-aos2305","url":null,"abstract":"For two correlated graphs which are independently sub-sampled from a common Erdős–Rényi graph G(n,p), we wish to recover their latent vertex matching from the observation of these two graphs without labels. When p=n−α+o(1) for α∈(0,1], we establish a sharp information-theoretic threshold for whether it is possible to correctly match a positive fraction of vertices. Our result sharpens a constant factor in a recent work by Wu, Xu and Yu.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135055279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Post-selection inference via algorithmic stability","authors":"Tijana Zrnic, Michael I. Jordan","doi":"10.1214/23-aos2303","DOIUrl":"https://doi.org/10.1214/23-aos2303","url":null,"abstract":"When the target of statistical inference is chosen in a data-driven manner, the guarantees provided by classical theories vanish. We propose a solution to the problem of inference after selection by building on the framework of algorithmic stability, in particular its branch with origins in the field of differential privacy. Stability is achieved via randomization of selection and it serves as a quantitative measure that is sufficient to obtain nontrivial post-selection corrections for classical confidence intervals. Importantly, the underpinnings of algorithmic stability translate directly into computational efficiency—our method computes simple corrections for selective inference without recourse to Markov chain Monte Carlo sampling.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135165184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}