{"title":"Extreme value inference for heterogeneous power law data","authors":"John H.J. Einmahl, Yi He","doi":"10.1214/23-aos2294","DOIUrl":"https://doi.org/10.1214/23-aos2294","url":null,"abstract":"We extend extreme value statistics to independent data with possibly very different distributions. In particular, we present novel asymptotic normality results for the Hill estimator, which now estimates the extreme value index of the average distribution. Due to the heterogeneity, the asymptotic variance can be substantially smaller than that in the i.i.d. case. As a special case, we consider a heterogeneous scales model where the asymptotic variance can be calculated explicitly. The primary tool for the proofs is the functional central limit theorem for a weighted tail empirical process. We also present asymptotic normality results for the extreme quantile estimator. A simulation study shows the good finite-sample behavior of our limit theorems. We also present applications to assess the tail heaviness of earthquake energies and of cross-sectional stock market losses.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135046050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inference on the maximal rank of time-varying covariance matrices using high-frequency data","authors":"Markus Reiss, Lars Winkelmann","doi":"10.1214/23-aos2273","DOIUrl":"https://doi.org/10.1214/23-aos2273","url":null,"abstract":"We study the rank of the instantaneous or spot covariance matrix ΣX(t) of a multidimensional process X(t). Given high-frequency observations X(i/n), i=0,…,n, we test the null hypothesis rank(ΣX(t))≤r for all t against local alternatives where the average (r+1)st eigenvalue is larger than some signal detection rate vn. A major problem is that the inherent averaging in local covariance statistics produces a bias that distorts the rank statistics. We show that the bias depends on the regularity and spectral gap of ΣX(t). We establish explicit matrix perturbation and concentration results that provide nonasymptotic uniform critical values and optimal signal detection rates vn. This leads to a rank estimation method via sequential testing. For a class of stochastic volatility models, we determine data-driven critical values via normed p-variations of estimated local covariance matrices. The methods are illustrated by simulations and an application to high-frequency data of U.S. government bonds.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"483 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135673417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimally tackling covariate shift in RKHS-based nonparametric regression","authors":"Cong Ma, Reese Pathak, Martin J. Wainwright","doi":"10.1214/23-aos2268","DOIUrl":"https://doi.org/10.1214/23-aos2268","url":null,"abstract":"We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chosen regularization parameter is minimax rate-optimal (up to a log factor) for a large family of RKHSs with regular kernel eigenvalues. Interestingly, KRR does not require full knowledge of the likelihood ratio apart from an upper bound on it. In striking contrast to the standard statistical setting without covariate shift, we also demonstrate that a naïve estimator, which minimizes the empirical risk over the function class, is strictly suboptimal under covariate shift as compared to KRR. We then address the larger class of covariate shift problems where likelihood ratio is possibly unbounded yet has a finite second moment. Here, we propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios. Again, we are able to show that this estimator is minimax optimal, up to logarithmic factors.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135673416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On high-dimensional Poisson models with measurement error: Hypothesis testing for nonlinear nonconvex optimization.","authors":"Fei Jiang, Yeqing Zhou, Jianxuan Liu, Yanyuan Ma","doi":"10.1214/22-aos2248","DOIUrl":"https://doi.org/10.1214/22-aos2248","url":null,"abstract":"<p><p>We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the <i>L</i><sub>1</sub> and <i>L</i><sub>2</sub> convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"51 1","pages":"233-259"},"PeriodicalIF":4.5,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438917/pdf/nihms-1868138.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10054730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On High dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization","authors":"Fei Jiang, Yeqing Zhou, Jianxuan Liu, Yanyuan Ma","doi":"10.48550/arXiv.2301.00139","DOIUrl":"https://doi.org/10.48550/arXiv.2301.00139","url":null,"abstract":"We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"51 1 1","pages":"233-259"},"PeriodicalIF":4.5,"publicationDate":"2022-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45193479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Liao, Zhengling Qi, Runzhe Wan, Predrag Klasnja, Susan A Murphy
{"title":"BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES.","authors":"Peng Liao, Zhengling Qi, Runzhe Wan, Predrag Klasnja, Susan A Murphy","doi":"10.1214/22-aos2231","DOIUrl":"https://doi.org/10.1214/22-aos2231","url":null,"abstract":"<p><p>We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optimal policy in a parameterized stochastic policy class. The performance of the estimated policy is measured by the difference between the optimal average reward in the policy class and the average reward of the estimated policy and we establish a finite-sample regret guarantee. The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"50 6","pages":"3364-3387"},"PeriodicalIF":4.5,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10072865/pdf/nihms-1837036.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9270218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2022-10-01Epub Date: 2022-10-27DOI: 10.1214/22-aos2210
Yijian Huang, Martin G Sanda
{"title":"LINEAR BIOMARKER COMBINATION FOR CONSTRAINED CLASSIFICATION.","authors":"Yijian Huang, Martin G Sanda","doi":"10.1214/22-aos2210","DOIUrl":"https://doi.org/10.1214/22-aos2210","url":null,"abstract":"<p><p>Multiple biomarkers are often combined to improve disease diagnosis. The uniformly optimal combination, i.e., with respect to all reasonable performance metrics, unfortunately requires excessive distributional modeling, to which the estimation can be sensitive. An alternative strategy is rather to pursue local optimality with respect to a specific performance metric. Nevertheless, existing methods may not target clinical utility of the intended medical test, which usually needs to operate above a certain sensitivity or specificity level, or do not have their statistical properties well studied and understood. In this article, we develop and investigate a linear combination method to maximize the clinical utility empirically for such a constrained classification. The combination coefficient is shown to have cube root asymptotics. The convergence rate and limiting distribution of the predictive performance are subsequently established, exhibiting robustness of the method in comparison with others. An algorithm with sound statistical justification is devised for efficient and high-quality computation. Simulations corroborate the theoretical results, and demonstrate good statistical and computational performance. Illustration with a clinical study on aggressive prostate cancer detection is provided.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"50 5","pages":"2793-2815"},"PeriodicalIF":4.5,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635489/pdf/nihms-1819429.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40449706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2022-06-01Epub Date: 2022-04-28DOI: 10.1007/s10029-022-02619-5
Omar Yusef Kudsi, Georges Kaoukabani, Naseem Bou-Ayash, Kelly Vallar, Alexandra Chudner, Sara LaGrange, Fahri Gokcal
{"title":"Quality of life and surgical outcomes of robotic retromuscular ventral hernia repair using a new hybrid mesh reinforcement.","authors":"Omar Yusef Kudsi, Georges Kaoukabani, Naseem Bou-Ayash, Kelly Vallar, Alexandra Chudner, Sara LaGrange, Fahri Gokcal","doi":"10.1007/s10029-022-02619-5","DOIUrl":"10.1007/s10029-022-02619-5","url":null,"abstract":"<p><strong>Purpose: </strong>The purpose of this study is to prospectively evaluate surgical and quality of life (QoL) outcomes of robotic retromuscular ventral hernia repair (rRMVHR) using a new hybrid mesh in high-risk patients.</p><p><strong>Methods: </strong>Data was prospectively collected for patients classified as high-risk based on the modified ventral hernia working group (VHWG) grading system, who underwent rRMVHR using Synecor™ Pre hybrid mesh in a single center, between 2019 and 2020. Pre-, intra- and postoperative variables including hernia recurrence, surgical site events (SSE), hernia-specific quality of life (QoL), and financial costs were analyzed. QoL assessments were obtained from preoperative and postoperative patient visits. Kaplan-Meier survival analysis was performed to analyze the estimated recurrence-free time.</p><p><strong>Results: </strong>Fifty-two high-risk patients, with a mean (±SD) age of 58.6 ± 13.7 years and BMI of 36.9 ± 6.6 kg/m<sup>2</sup>, were followed for a mean (±SD) period of 22.4 ± 7.1 months. A total of 11 (21.2%) patients experienced postoperative complications, out of which eight were SSEs, including 7 (13.5%) seromas, 1 (1.9%) hematoma, and no infections. Procedural interventions were required for 2 (3.8%) surgical site occurrences. Recurrence was seen in 1 (1.9%) patient. The estimated mean (95% confidence interval) recurrence-free time was 33 (32.3-34.5) months. Postoperative QoL assessments demonstrated significant improvements in comparison to preoperative QoL, with a minimum ∆mean (±SD) of -15.5 ± 2.2 at one month (p < 0.001). The mean (±SD) procedure cost was $13,924.18 ± 7856.95 which includes the average mesh cost ($5390.12 ± 3817.03).</p><p><strong>Conclusion: </strong>Our study showed favorable early and mid-term outcomes, in addition to significant improvements in QoL, after rRMVHR using Synecor™ hybrid mesh in high-risk patients.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"20 1","pages":"881-888"},"PeriodicalIF":2.3,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88522489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2022-06-01Epub Date: 2022-06-16DOI: 10.1214/21-aos2152
Zijian Guo, Domagoj Ćevid, Peter Bühlmann
{"title":"DOUBLY DEBIASED LASSO: HIGH-DIMENSIONAL INFERENCE UNDER HIDDEN CONFOUNDING.","authors":"Zijian Guo, Domagoj Ćevid, Peter Bühlmann","doi":"10.1214/21-aos2152","DOIUrl":"https://doi.org/10.1214/21-aos2152","url":null,"abstract":"<p><p>Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden confounding and propose the <i>Doubly Debiased Lasso</i> estimator for individual components of the regression coefficient vector. Our advocated method simultaneously corrects both the bias due to estimation of high-dimensional parameters as well as the bias caused by the hidden confounding. We establish its asymptotic normality and also prove that it is efficient in the Gauss-Markov sense. The validity of our methodology relies on a dense confounding assumption, i.e. that every confounding variable affects many covariates. The finite sample performance is illustrated with an extensive simulation study and a genomic application.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"50 3","pages":"1320-1347"},"PeriodicalIF":4.5,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9365063/pdf/nihms-1824950.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40608265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OPTIMAL FALSE DISCOVERY RATE CONTROL FOR LARGE SCALE MULTIPLE TESTING WITH AUXILIARY INFORMATION.","authors":"Hongyuan Cao, Jun Chen, Xianyang Zhang","doi":"10.1214/21-aos2128","DOIUrl":"https://doi.org/10.1214/21-aos2128","url":null,"abstract":"<p><p>Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses <i>a priori</i>, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of <i>p</i>-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"50 2","pages":"807-857"},"PeriodicalIF":4.5,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153594/pdf/nihms-1840915.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9776938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}