BiometrikaPub Date : 2024-06-22DOI: 10.1093/biomet/asae027
Keyao Wei, Lengyang Wang, Yingcun Xia
{"title":"Testing serial dependence or cross dependence for time series with underreporting","authors":"Keyao Wei, Lengyang Wang, Yingcun Xia","doi":"10.1093/biomet/asae027","DOIUrl":"https://doi.org/10.1093/biomet/asae027","url":null,"abstract":"In practice, it is common for collected data to be underreported, which is particularly prevalent in fields such as social sciences, ecology and epidemiology. Drawing inferences from such data using conventional statistical methods can lead to incorrect conclusions. In this paper, we study tests for serial or cross dependence in time series data that are subject to underreporting. We introduce new test statistics, develop corresponding group-of-blocks bootstrap techniques, and establish their consistency. The methods are shown to be efficient by simulation and are used to identify key factors responsible for the spread of dengue fever and the occurrence of cardiovascular disease.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"197 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-05-13DOI: 10.1093/biomet/asae023
Alexander Henzi, Michael Law
{"title":"A Rank-Based Sequential Test of Independence","authors":"Alexander Henzi, Michael Law","doi":"10.1093/biomet/asae023","DOIUrl":"https://doi.org/10.1093/biomet/asae023","url":null,"abstract":"Summary We consider the problem of independence testing for two univariate random variables in a sequential setting. By leveraging recent developments on safe, anytime-valid inference, we propose a test with time-uniform type I error control and derive explicit bounds on the finite sample performance of the test. We demonstrate the empirical performance of the procedure in comparison to existing sequential and non-sequential independence tests. Furthermore, since the proposed test is distribution free under the null hypothesis, we empirically simulate the gap due to Ville’s inequality–the supermartingale analogue of Markov’s inequality–that is commonly applied to control type I error in anytime-valid inference, and apply this to construct a truncated sequential test.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"23 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-05-05DOI: 10.1093/biomet/asae022
Cheng-Han Yang, Yu-Jen Cheng
{"title":"A model-free variable screening method for optimal treatment regimes with high-dimensional survival data","authors":"Cheng-Han Yang, Yu-Jen Cheng","doi":"10.1093/biomet/asae022","DOIUrl":"https://doi.org/10.1093/biomet/asae022","url":null,"abstract":"Summary We propose a model-free variable screening method for the optimal treatment regime with high-dimensional survival data. The proposed screening method provides a unified framework to select the active variables in a prespecified target population, including the treated group as a special case. Based on this framework, the optimal treatment regime is exactly the optimal classifier that minimizes a weighted misclassification error rate, with weights associated with survival outcome variables, the censoring distribution, and a prespecified target population. Our main contribution involves reformulating the weighted classification problem into a classification problem within a hypothetical population, where the observed data can be viewed as a sample obtained from outcome-dependent sampling, with the selection probability inversely proportional to the weights. Consequently, we introduce the weighted Kolmogorov–Smirnov approach for selecting active variables in the optimal treatment regime, extending the conventional Kolmogorov–Smirnov method for binary classification. Additionally, the proposed screening method exhibits two levels of robustness. The first level of robustness is achieved because the proposed method does not require any model assumptions for survival outcome on treatment and covariates, whereas the other is attained as the form of treatment regimes is allowed to be unspecified even without requiring convex surrogate loss, such as logit loss or hinge loss. As a result, the proposed screening method is robust to model misspecifications, and nonparametric learning methods such as random forests and boosting can be applied to those selected variables for further analysis. The theoretical properties of the proposed method are established. The performance of the proposed method is examined through simulation studies and illustrated by a real dataset.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"46 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-04-13DOI: 10.1093/biomet/asae021
Jeffrey Zhang, Dylan S Small, Siyu Heng
{"title":"Sensitivity analysis for matched observational studies with continuous exposures and binary outcomes","authors":"Jeffrey Zhang, Dylan S Small, Siyu Heng","doi":"10.1093/biomet/asae021","DOIUrl":"https://doi.org/10.1093/biomet/asae021","url":null,"abstract":"Summary Matching is one of the most widely used study designs for adjusting for measured confounders in observational studies. However, unmeasured confounding may exist and cannot be removed by matching. Therefore, a sensitivity analysis is typically needed to assess a causal conclusion’s sensitivity to unmeasured confounding. Sensitivity analysis frameworks for binary exposures have been well-established for various matching designs and are commonly used in various studies. However, unlike the binary exposure case, there still lacks valid and general sensitivity analysis methods for continuous exposures, except in some special cases such as pair matching. To fill this gap in the binary outcome case, we develop a sensitivity analysis framework for general matching designs with continuous exposures and binary outcomes. First, we use probabilistic lattice theory to show our sensitivity analysis approach is finite-population- exact under Fisher’s sharp null. Second, we prove a novel design sensitivity formula as a powerful tool for asymptotically evaluating the performance of our sensitivity analysis approach. Third, to allow effect heterogeneity with binary outcomes, we introduce a framework for conducting asymptotically exact inference and sensitivity analysis on generalized attributable effects with binary outcomes via mixed- integer programming. Fourth, for the continuous outcomes case, we show that conducting an asymptotically exact sensitivity analysis in matched observational studies when both the exposures and outcomes are continuous is generally NP-hard, except in some special cases such as pair matching. As a real data application, we apply our new methods to study the effect of early-life lead exposure on juvenile delinquency. An implementation of the methods in this work is available in the R package doseSens.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140568514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-04-11DOI: 10.1093/biomet/asae020
Erin E Gabriel, Michael C Sachs, Andreas Kryger Jensen
{"title":"Sharp symbolic nonparametric bounds for measures of benefit in observational and imperfect randomized studies with ordinal outcomes","authors":"Erin E Gabriel, Michael C Sachs, Andreas Kryger Jensen","doi":"10.1093/biomet/asae020","DOIUrl":"https://doi.org/10.1093/biomet/asae020","url":null,"abstract":"Summary The probability of benefit is a valuable and meaningful measure of treatment effect, which has advantages over the average treatment effect. Particularly for an ordinal outcome, it has a better interpretation and can make apparent different aspects of the treatment impact. Unfortunately, this measure, and variations of it, are not identifiable even in randomized trials with perfect compliance. There is, for this reason, a long literature on nonparametric bounds for unidentifiable measures of benefit. These have primarily focused on perfect randomized trial settings and one or two specific estimands. We expand these bounds to observational settings with unmeasured confounders and imperfect randomized trials for all three estimands considered in the literature: the probability of benefit, the probability of no harm, and the relative treatment effect.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"49 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140568397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-04-08DOI: 10.1093/biomet/asae015
J Zhang, F Xue, Q Xu, J Lee, A Qu
{"title":"Individualized dynamic model for multi-resolutional data","authors":"J Zhang, F Xue, Q Xu, J Lee, A Qu","doi":"10.1093/biomet/asae015","DOIUrl":"https://doi.org/10.1093/biomet/asae015","url":null,"abstract":"SUMMARY Mobile health has emerged as a major success for tracking individual health status, due to the popularity and power of smartphones and wearable devices. This has also brought great challenges in handling heterogeneous, multi-resolution data which arise ubiquitously in mobile health due to irregular multivariate measurements collected from individuals. In this paper, we propose an individualized dynamic latent factor model for irregular multi-resolution time series data to interpolate unsampled measurements of time series with low resolution. One major advantage of the proposed method is the capability to integrate multiple irregular time series and multiple subjects by mapping the multi-resolution data to the latent space. In addition, the proposed individualized dynamic latent factor model is applicable to capturing heterogeneous longitudinal information through individualized dynamic latent factors. Our theory provides a bound on the integrated interpolation error and the convergence rate for B-spline approximation methods. Both the simulation studies and the application to smartwatch data demonstrate the superior performance of the proposed method compared to existing methods.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"3 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140568582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-03-23DOI: 10.1093/biomet/asae018
Jesse Hemerik, Aldo Solari, Jelle J Goeman
{"title":"Flexible control of the median of the false discovery proportion","authors":"Jesse Hemerik, Aldo Solari, Jelle J Goeman","doi":"10.1093/biomet/asae018","DOIUrl":"https://doi.org/10.1093/biomet/asae018","url":null,"abstract":"We introduce a multiple testing procedure that controls the median of the proportion of false discoveries in a flexible way. The procedure only requires a vector of p-values as input and is comparable to the Benjamini–Hochberg method, which controls the mean of the proportion of false discoveries. Our method allows free choice of one or several values of alpha after seeing the data, unlike the Benjamini–Hochberg procedure, which can be very anti-conservative when alpha is chosen post hoc. We prove these claims and illustrate them with simulations. Our procedure is inspired by a popular estimator of the total number of true hypotheses. We adapt this estimator to provide simultaneously median unbiased estimators of the proportion of false discoveries, valid for finite samples. This simultaneity allows for the claimed flexibility. Our approach does not assume independence. The time complexity of our method is linear in the number of hypotheses, after sorting the p-values.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"309 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-03-19DOI: 10.1093/biomet/asae016
M J Stensrud, J D Laurendeau, A L Sarvet
{"title":"Optimal regimes for algorithm-assisted human decision-making","authors":"M J Stensrud, J D Laurendeau, A L Sarvet","doi":"10.1093/biomet/asae016","DOIUrl":"https://doi.org/10.1093/biomet/asae016","url":null,"abstract":"Summary We consider optimal regimes for algorithm-assisted human decision-making. Such regimes are decision functions of measured pre-treatment variables and, by leveraging natural treatment values, enjoy a superoptimality property whereby they are guaranteed to outperform conventional optimal regimes. When there is unmeasured confounding, the benefit of using superoptimal regimes can be considerable. When there is no unmeasured confounding, superoptimal regimes are identical to conventional optimal regimes. Furthermore, identification of the expected outcome under superoptimal regimes in non-experimental studies requires the same assumptions as identification of value functions under conventional optimal regimes when the treatment is binary. To illustrate the utility of superoptimal regimes, we derive identification and estimation results in a common instrumental variable setting. We use these derivations to analyse examples from the optimal regimes literature, including a case study of the effect of prompt intensive care treatment on survival.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"309 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-02-26DOI: 10.1093/biomet/asae012
A S DiLernia, M Fiecas, L Zhang
{"title":"Inference of partial correlations of a multivariate Gaussian time series","authors":"A S DiLernia, M Fiecas, L Zhang","doi":"10.1093/biomet/asae012","DOIUrl":"https://doi.org/10.1093/biomet/asae012","url":null,"abstract":"We derive an asymptotic joint distribution and novel covariance estimator for the partial correlations of a multivariate Gaussian time series given mild regularity conditions. Using our derived asymptotic distribution, we develop a Wald confidence interval and testing procedure for inference of individual partial correlations for time series data. Through simulation we demonstrate that our proposed confidence interval attains higher coverage rates, and our testing procedure attains false positive rates closer to the nominal levels than approaches that assume independent observations when autocorrelation is present.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"101 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139977684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-02-24DOI: 10.1093/biomet/asae011
Y Hu, W Wang
{"title":"Network-adjusted covariates for community detection","authors":"Y Hu, W Wang","doi":"10.1093/biomet/asae011","DOIUrl":"https://doi.org/10.1093/biomet/asae011","url":null,"abstract":"Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e., covariates. Existing methods have shown the effectiveness of using covariates on the low-degree nodes, but rarely discuss the case where communities have significantly different density levels, i.e. multiscale networks. In this paper, we introduce a novel method that addresses this challenge by constructing network-adjusted covariates, which leverage the network connections and covariates with a node-specific weight for each node. This weight can be calculated without tuning parameters. We present novel theoretical results on the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of misspecification and multiple sparse communities. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal for connection intensity up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network. We then compare our method with others on a statistics publication citation network where 30% of nodes are isolated, and our method produces reasonable and balanced results.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"35 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139950512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}