BiometrikaPub Date : 2024-01-20DOI: 10.1093/biomet/asae001
D P Wiens
{"title":"A note on minimax robustness of designs against correlated or heteroscedastic responses","authors":"D P Wiens","doi":"10.1093/biomet/asae001","DOIUrl":"https://doi.org/10.1093/biomet/asae001","url":null,"abstract":"Summary We present a result according to which certain functions of covariance matrices are maximized at scalar multiples of the identity matrix. This is used to show that experimental designs that are optimal under an assumption of independent, homoscedastic responses can be minimax robust, in broad classes of alternate covariance structures. In particular it can justify the common practice of disregarding possible dependence, or heteroscedasticity, at the design stage of an experiment.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139506157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-01-17DOI: 10.1093/biomet/asae002
K Klockmann, T Krivobokova
{"title":"Efficient nonparametric estimation of Toeplitz covariance matrices","authors":"K Klockmann, T Krivobokova","doi":"10.1093/biomet/asae002","DOIUrl":"https://doi.org/10.1093/biomet/asae002","url":null,"abstract":"A new efficient nonparametric estimator for Toeplitz covariance matrices is proposed. This estimator is based on a data transformation that translates the problem of Toeplitz covariance matrix estimation to the problem of mean estimation in an approximate Gaussian regression. The resulting Toeplitz covariance matrix estimator is positive definite by construction, fully data-driven and computationally very fast. Moreover, this estimator is shown to be minimax optimal under the spectral norm for a large class of Toeplitz matrices. These results are readily extended to estimation of inverses of Toeplitz covariance matrices. Also, an alternative version of the Whittle likelihood for the spectral density based on the discrete cosine transform is proposed. The method is implemented in the R package vstdct that accompanies the paper.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139506380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-12-22DOI: 10.1093/biomet/asad078
Jelle J Goeman, Aldo Solari
{"title":"On Selecting and Conditioning in Multiple Testing and Selective Inference","authors":"Jelle J Goeman, Aldo Solari","doi":"10.1093/biomet/asad078","DOIUrl":"https://doi.org/10.1093/biomet/asad078","url":null,"abstract":"We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We give general theory and intuitions before investigating in detail several case studies where a shift to a non-selective or unconditional perspective can yield a power gain.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139051164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-12-22DOI: 10.1093/biomet/asad080
P A Maugis
{"title":"Central limit theorems for local network statistics","authors":"P A Maugis","doi":"10.1093/biomet/asad080","DOIUrl":"https://doi.org/10.1093/biomet/asad080","url":null,"abstract":"Summary Subgraph counts, in particular the number of occurrences of small shapes such as triangles, characterize properties of random networks. As a result, they have seen wide use as network summary statistics. Subgraphs are typically counted globally, making existing approaches unable to describe vertex-specific characteristics. In contrast, rooted subgraphs focus on vertex neighbourhoods, and are fundamental descriptors of local network properties. We derive the asymptotic joint distribution of rooted subgraph counts in inhomogeneous random graphs, a model which generalizes most statistical network models. This result enables a shift in the statistical analysis of graphs, from estimating network summaries, to estimating models linking local network structure and vertex-specific covariates. As an example, we consider a school friendship network and show that gender and race are significant predictors of local friendship patterns.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139051073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-12-21DOI: 10.1093/biomet/asad079
Alexander Aue, Claudia Kirch
{"title":"The state of cumulative sum sequential change point testing seventy years after Page","authors":"Alexander Aue, Claudia Kirch","doi":"10.1093/biomet/asad079","DOIUrl":"https://doi.org/10.1093/biomet/asad079","url":null,"abstract":"\u0000 Quality control charts aim at raising an alarm as soon as sequentially obtained observations of an underlying random process no longer seem to be within stochastic fluctuations prescribed by an ‘in-control’ scenario. Such random processes can often be modelled using the concept of stationarity, or even independence as in most classical works. An important out-of-control scenario is the changepoint alternative, for which the distribution of the process changes at an unknown point in time. In his seminal 1954 Biometrika paper, E. S. Page introduced the famous cumulative sum control charts for changepoint monitoring. Innovatively, decision rules based on cumulative sum procedures took the full history of the process into account, whereas previous procedures were based only on a fixed and typically small number of the most recent observations. The extreme case of using only the most recent observation, often referred to as the Shewhart chart, is more akin to serial outlier than changepoint detection. Page’s cumulative sum approach, introduced seven decades ago, is ubiquitous in modern changepoint analysis, and his original paper has led to a multitude of follow-up papers in different research communities. This review is focused on a particular subfield of this research, namely nonparametric sequential, or online, changepoint tests which are constructed to maintain a desired Type 1 error as opposed to the more traditional approach seeking to minimize the average run length of the procedures. Such tests have originated at the intersection of econometrics and statistics. We trace the development of these tests and highlight their properties, mostly using a simple location model for clarity of exposition, but also review more complex situations such as regression and time series models.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138951837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-12-20DOI: 10.1093/biomet/asad077
{"title":"Correction to: ‘A cross-validation-based statistical theory for point processes’","authors":"","doi":"10.1093/biomet/asad077","DOIUrl":"https://doi.org/10.1093/biomet/asad077","url":null,"abstract":"","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139169267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-12-01DOI: 10.1093/biomet/asad075
Shulei Wang, Bo Yuan, T Tony Cai, Hongzhe Li
{"title":"Phylogenetic Association Analysis with Conditional Rank Correlation","authors":"Shulei Wang, Bo Yuan, T Tony Cai, Hongzhe Li","doi":"10.1093/biomet/asad075","DOIUrl":"https://doi.org/10.1093/biomet/asad075","url":null,"abstract":"Summary Phylogenetic association analysis plays a crucial role in investigating the correlation between microbial compositions and specific outcomes of interest in microbiome studies. However, existing methods for testing such associations have limitations related to the assumption of a linear association in high-dimensional settings and the handling of confounding effects. Therefore, there is a need for methods capable of characterizing complex associations, including nonmonotonic relationships. This paper introduces a novel phylogenetic association analysis framework and associated tests to address these challenges by employing conditional rank correlation as a measure of association. These tests account for confounders in a fully nonparametric manner, ensuring robustness against outliers and the ability to detect diverse dependencies. The proposed framework aggregates conditional rank correlations for subtrees using a weighted sum and maximum approach to capture both dense and sparse signals. The significance level of the test statistics is determined by calibrating through a nearest neighbour bootstrapping method, which is straightforward to implement and can accommodate additional datasets when available. The practical advantages of the proposed framework are demonstrated through numerical experiments utilizing both simulated and real microbiome datasets.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conformalized survival analysis with adaptive cutoffs","authors":"Yu Gui, Rohan Hore, Zhimei Ren, Rina Foygel Barber","doi":"10.1093/biomet/asad076","DOIUrl":"https://doi.org/10.1093/biomet/asad076","url":null,"abstract":"Summary This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds (LPBs) for survival times with censored data.We build on recent work by Candès et al. (2021), whose approach first subsets the data to discard any data points with early censoring times, and then uses a reweighting technique (namely, weighted conformal inference (Tibshirani et al., 2019)) to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to LPBs that are less conservative and give more accurate information. We show that in the Type I right-censoring setting, if either of the censoring mechanism or the conditional quantile of survival time is well estimated, our proposed procedure achieves nearly exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate LPBs for users’ active times on a mobile app.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-12-01Epub Date: 2023-02-06DOI: 10.1093/biomet/asad007
Sijia Li, Alex Luedtke
{"title":"Efficient Estimation under Data Fusion.","authors":"Sijia Li, Alex Luedtke","doi":"10.1093/biomet/asad007","DOIUrl":"10.1093/biomet/asad007","url":null,"abstract":"<p><p>We aim to make inferences about a smooth, finite-dimensional parameter by fusing data from multiple sources together. Previous works have studied the estimation of a variety of parameters in similar data fusion settings, including in the estimation of the average treatment effect and average reward under a policy, with the majority of them merging one historical data source with covariates, actions, and rewards and one data source of the same covariates. In this work, we consider the general case where one or more data sources align with each part of the distribution of the target population, for example, the conditional distribution of the reward given actions and covariates. We describe potential gains in efficiency that can arise from fusing these data sources together in a single analysis, which we characterize by a reduction in the semiparametric efficiency bound. We also provide a general means to construct estimators that achieve these bounds. In numerical simulations, we illustrate marked improvements in efficiency from using our proposed estimators rather than their natural alternatives. Finally, we illustrate the magnitude of efficiency gains that can be realized in vaccine immunogenicity studies by fusing data from two HIV vaccine trials.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10653189/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44309457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-11-28DOI: 10.1093/biomet/asad074
Ryan Thompson, Catherine S Forbes, Steven N Maceachern, Mario Peruggia
{"title":"Familial inference: Tests for hypotheses on a family of centres","authors":"Ryan Thompson, Catherine S Forbes, Steven N Maceachern, Mario Peruggia","doi":"10.1093/biomet/asad074","DOIUrl":"https://doi.org/10.1093/biomet/asad074","url":null,"abstract":"Statistical hypotheses are translations of scientific hypotheses into statements about one or more distributions, often concerning their centre. Tests that assess statistical hypotheses of centre implicitly assume a specific centre, e.g., the mean or median. Yet, scientific hypotheses do not always specify a particular centre. This ambiguity leaves the possibility for a gap between scientific theory and statistical practice that can lead to rejection of a true null. In the face of replicability crises in many scientific disciplines, significant results of this kind are concerning. Rather than testing a single centre, this paper proposes testing a family of plausible centres, such as that induced by the Huber loss function. Each centre in the family generates a testing problem, and the resulting family of hypotheses constitutes a familial hypothesis. A Bayesian nonparametric procedure is devised to test familial hypotheses, enabled by a novel pathwise optimization routine to fit the Huber family. The favourable properties of the new test are demonstrated theoretically and experimentally. Two examples from psychology serve as real-world case studies.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}