BiometrikaPub Date : 2023-06-10DOI: 10.1093/biomet/asad028
{"title":"Correction to: Ancestor regression in linear structural equation models","authors":"","doi":"10.1093/biomet/asad028","DOIUrl":"https://doi.org/10.1093/biomet/asad028","url":null,"abstract":"","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47990042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-06-08DOI: 10.1093/biomet/asad037
Xin Bing, Marten Wegkamp
{"title":"Interpolating discriminant functions in high-dimensional Gaussian latent mixtures","authors":"Xin Bing, Marten Wegkamp","doi":"10.1093/biomet/asad037","DOIUrl":"https://doi.org/10.1093/biomet/asad037","url":null,"abstract":"Abstract This paper considers binary classification of high-dimensional features under a postulated model with a low-dimensional latent Gaussian mixture structure and nonvanishing noise. A generalized least-squares estimator is used to estimate the direction of the optimal separating hyperplane. The estimated hyperplane is shown to interpolate on the training data. While the direction vector can be consistently estimated, as could be expected from recent results in linear regression, a naive plug-in estimate fails to consistently estimate the intercept. A simple correction, which requires an independent hold-out sample, renders the procedure minimax optimal in many scenarios. The interpolation property of the latter procedure can be retained, but surprisingly depends on the way the labels are encoded.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135215337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-06-01DOI: 10.1093/biomet/asac042
Matthew J Tudball, Rachael A Hughes, Kate Tilling, Jack Bowden, Qingyuan Zhao
{"title":"Sample-constrained partial identification with application to selection bias.","authors":"Matthew J Tudball, Rachael A Hughes, Kate Tilling, Jack Bowden, Qingyuan Zhao","doi":"10.1093/biomet/asac042","DOIUrl":"https://doi.org/10.1093/biomet/asac042","url":null,"abstract":"<p><p>Many partial identification problems can be characterized by the optimal value of a function over a set where both the function and set need to be estimated by empirical data. Despite some progress for convex problems, statistical inference in this general setting remains to be developed. To address this, we derive an asymptotically valid confidence interval for the optimal value through an appropriate relaxation of the estimated set. We then apply this general result to the problem of selection bias in population-based cohort studies. We show that existing sensitivity analyses, which are often conservative and difficult to implement, can be formulated in our framework and made significantly more informative via auxiliary information on the population. We conduct a simulation study to evaluate the finite sample performance of our inference procedure, and conclude with a substantive motivating example on the causal effect of education on income in the highly selected UK Biobank cohort. We demonstrate that our method can produce informative bounds using plausible population-level auxiliary constraints. We implement this method in the [Formula: see text] package [Formula: see text].</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"110 2","pages":"485-498"},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10183833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9914105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-05-11DOI: 10.1093/biomet/asad032
F. Castelletti, S. Peluso
{"title":"Bayesian learning of network structures from interventional experimental data","authors":"F. Castelletti, S. Peluso","doi":"10.1093/biomet/asad032","DOIUrl":"https://doi.org/10.1093/biomet/asad032","url":null,"abstract":"\u0000 Directed Acyclic Graphs (DAGs) provide an effective framework for learning causal relationships among variables given multivariate observations. Under pure observational data, DAGs encoding the same conditional independencies cannot be distinguished and are collected into Markov equivalence classes. In many contexts however, observational measurements are supplemented by interventional data that improve DAG identifiability and enhance causal effect estimation. We propose a Bayesian framework for multivariate data partially generated after stochastic interventions. To this end, we introduce an effective prior elicitation procedure leading to a closed-form expression for the DAG marginal likelihood and guaranteeing score equivalence among DAGs that are Markov equivalent post intervention. Under the Gaussian setting we show, in terms of posterior ratio consistency, that the true network will be asymptotically recovered, regardless of the specific distribution of the intervened variables and of the relative asymptotic dominance between observational and interventional measurements. We validate our theoretical results in simulation and we implement on both synthetic and biological protein expression data a Markov chain Monte Carlo sampler for posterior inference on the space of DAGs.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43958916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-04-04DOI: 10.1093/biomet/asad024
Anna Calissano, Aasa Feragen, S. Vantini
{"title":"Populations of Unlabelled Networks: Graph Space Geometry and Generalized Geodesic Principal Components","authors":"Anna Calissano, Aasa Feragen, S. Vantini","doi":"10.1093/biomet/asad024","DOIUrl":"https://doi.org/10.1093/biomet/asad024","url":null,"abstract":"\u0000 Statistical analysis for populations of networks is widely applicable but challenging as networks have strongly non-Euclidean behaviour. Graph space is an exhaustive framework for studying populations of unlabelled networks which are weighted or unweighted, uni- or multi-layered, directed or undirected. Viewing graph space as the quotient of a Euclidean space with respect to a finite group action, we show that it is not a manifold, and that its curvature is unbounded from above. Within this geometrical framework we define generalized geodesic principal components, and we introduce the align all and compute algorithms, all of which allow for the computation of statistics on graph space. The statistics and algorithms are compared with existing methods and empirically validated on three real datasets, showcasing the framework potential utility. The whole framework is implemented within the geomstats Python package.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48150426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-03-23eCollection Date: 2023-01-01DOI: 10.5114/cipp/155945
Mariola Łaguna, Emilia Mielniczuk, Wiktor Razmus
{"title":"Development and initial validation of the Daily Goal Realization Scale.","authors":"Mariola Łaguna, Emilia Mielniczuk, Wiktor Razmus","doi":"10.5114/cipp/155945","DOIUrl":"10.5114/cipp/155945","url":null,"abstract":"<p><strong>Background: </strong>This paper presents the results of three studies allowing the design and initial validation of the Daily Goal Realization Scale (DGRS). Goal realization refers to the engagement in goal-directed behavior that leads to progress in personal goal attainment; it is considered one of the adaptive personal characteristics.</p><p><strong>Participants and procedure: </strong>Three studies, including an initial study to develop and select the items (Study 1), an intensive longitudinal study (Study 2), and a multiple goal evaluation study (Study 3), tested factorial structure, reliability and validity of the measure.</p><p><strong>Results: </strong>Multilevel confirmatory factor analysis confirmed the unidimensional structure of the DGRS (obtained in Study 1) both at the individual and goal level, captured as daily goal realization (Study 2) and as multiple goal realization (Study 3). The validity of the DGRS was supported by meaningful associations with other goal evaluations (Study 3). As expected, the DGRS was positively related to evaluations of progress in goal achievement, engagement, likelihood of success, and goal importance. The DGRS also demonstrated measurement invariance allowing for meaningful comparisons of scores between men and women.</p><p><strong>Conclusions: </strong>The findings indicate that the DGRS is a brief and reliable idiographic measure of daily goal realization. The scale has excellent internal consistency and good criterion validity.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"51 1","pages":"240-250"},"PeriodicalIF":1.0,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654345/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82684058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-03-21DOI: 10.1093/biomet/asad021
Dimitris N Politis
{"title":"Scalable subsampling: computation, aggregation and inference","authors":"Dimitris N Politis","doi":"10.1093/biomet/asad021","DOIUrl":"https://doi.org/10.1093/biomet/asad021","url":null,"abstract":"Abstract Subsampling has seen a resurgence in the big data era where the standard, full-resample size bootstrap can be infeasible to compute. Nevertheless, even choosing a single random subsample of size b can be computationally challenging with both b and the sample size n being very large. This paper shows how a set of appropriately chosen, nonrandom subsamples can be used to conduct effective, and computationally feasible, subsampling distribution estimation. Furthermore, the same set of subsamples can be used to yield a procedure for subsampling aggregation, also known as subagging, that is scalable with big data. Interestingly, the scalable subagging estimator can be tuned to have the same, or better, rate of convergence than that of θ^n. Statistical inference could then be based on the scalable subagging estimator instead of the original θ^n.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"438 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135001298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-03-16DOI: 10.1093/biomet/asad019
F. Sävje
{"title":"Causal inference with misspecified exposure mappings: separating definitions and assumptions","authors":"F. Sävje","doi":"10.1093/biomet/asad019","DOIUrl":"https://doi.org/10.1093/biomet/asad019","url":null,"abstract":"\u0000 Exposure mappings facilitate investigations of complex causal effects when units interact in experiments. Current methods require experimenters to use the same exposure mappings both to define the effect of interest and to impose assumptions on the interference structure. However, the two roles rarely coincide in practice, and experimenters are forced to make the often questionable assumption that their exposures are correctly specified. This paper argues that the two roles exposure mappings currently serve can, and typically should, be separated, so that exposures are used to define effects without necessarily assuming that they are capturing the complete causal structure in the experiment. The paper shows that this approach is practically viable by providing conditions under which exposure effects can be precisely estimated when the exposures are misspecified. Some important questions remain open.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49317373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-03-15DOI: 10.1093/biomet/asad018
Giovanni Motta, W. Wu, M. Pourahmadi
{"title":"√2-Estimation for Smooth Eigenvectors of Matrix-Valued Functions","authors":"Giovanni Motta, W. Wu, M. Pourahmadi","doi":"10.1093/biomet/asad018","DOIUrl":"https://doi.org/10.1093/biomet/asad018","url":null,"abstract":"\u0000 Modern statistical methods for multivariate time series rely on the eigendecomposition of matrix-valued functions such as time-varying covariance and spectral density matrices. The curse of indeterminacy or misidentification of smooth eigenvector functions has not received much attention. We resolve this important problem and recover smooth trajectories by examining the distance between the eigenvectors of the same matrix-valued function evaluated at two consecutive points. We change the sign of the next eigenvector if its distance with the current one is larger than the square root of 2. In the case of distinct eigenvalues, this simple method delivers smooth eigenvectors. For coalescing eigenvalues, we match the corresponding eigenvectors and apply an additional signing around the coalescing points. We establish consistency and rates of convergence for the proposed smooth eigenvector estimators. Simulation results and applications to real data confirm that our approach is needed to obtain smooth eigenvectors.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42009018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-03-01Epub Date: 2022-04-05DOI: 10.1093/biomet/asac021
Jinsong Chen, Quefeng Li, Hua Yun Chen
{"title":"Testing generalized linear models with high-dimensional nuisance parameter.","authors":"Jinsong Chen, Quefeng Li, Hua Yun Chen","doi":"10.1093/biomet/asac021","DOIUrl":"10.1093/biomet/asac021","url":null,"abstract":"<p><p>Generalized linear models often have a high-dimensional nuisance parameters, as seen in applications such as testing gene-environment interactions or gene-gene interactions. In these scenarios, it is essential to test the significance of a high-dimensional sub-vector of the model's coefficients. Although some existing methods can tackle this problem, they often rely on the bootstrap to approximate the asymptotic distribution of the test statistic, and thus are computationally expensive. Here, we propose a computationally efficient test with a closed-form limiting distribution, which allows the parameter being tested to be either sparse or dense. We show that under certain regularity conditions, the type I error of the proposed method is asymptotically correct, and we establish its power under high-dimensional alternatives. Extensive simulations demonstrate the good performance of the proposed test and its robustness when certain sparsity assumptions are violated. We also apply the proposed method to Chinese famine sample data in order to show its performance when testing the significance of gene-environment interactions.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"110 1","pages":"83-99"},"PeriodicalIF":2.7,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9933885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10800040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}