{"title":"Resampling-based confidence intervals and bands for the average treatment effect in observational studies with competing risks","authors":"Jasmin Rühl, Sarah Friedrich","doi":"10.1007/s11222-024-10420-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10420-w","url":null,"abstract":"<p>The g-formula can be used to estimate the treatment effect while accounting for confounding bias in observational studies. With regard to time-to-event endpoints, possibly subject to competing risks, the construction of valid pointwise confidence intervals and time-simultaneous confidence bands for the causal risk difference is complicated, however. A convenient solution is to approximate the asymptotic distribution of the corresponding stochastic process by means of resampling approaches. In this paper, we consider three different resampling methods, namely the classical nonparametric bootstrap, the influence function equipped with a resampling approach as well as a martingale-based bootstrap version, the so-called wild bootstrap. For the latter, three sub-versions based on differing distributions of the underlying random multipliers are examined. We set up a simulation study to compare the accuracy of the different techniques, which reveals that the wild bootstrap should in general be preferred if the sample size is moderate and sufficient data on the event of interest have been accrued. For illustration, the resampling methods are further applied to data on the long-term survival in patients with early-stage Hodgkin’s disease.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kes Ward, Gaetano Romano, Idris Eckley, Paul Fearnhead
{"title":"A constant-per-iteration likelihood ratio test for online changepoint detection for exponential family models","authors":"Kes Ward, Gaetano Romano, Idris Eckley, Paul Fearnhead","doi":"10.1007/s11222-024-10416-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10416-6","url":null,"abstract":"<p>Online changepoint detection algorithms that are based on (generalised) likelihood-ratio tests have been shown to have excellent statistical properties. However, a simple online implementation is computationally infeasible as, at time <i>T</i>, it involves considering <i>O</i>(<i>T</i>) possible locations for the change. Recently, the FOCuS algorithm has been introduced for detecting changes in mean in Gaussian data that decreases the per-iteration cost to <span>(O(log T))</span>. This is possible by using pruning ideas, which reduce the set of changepoint locations that need to be considered at time <i>T</i> to approximately <span>(log T)</span>. We show that if one wishes to perform the likelihood ratio test for a different one-parameter exponential family model, then exactly the same pruning rule can be used, and again one need only consider approximately <span>(log T)</span> locations at iteration <i>T</i>. Furthermore, we show how we can adaptively perform the maximisation step of the algorithm so that we need only maximise the test statistic over a small subset of these possible locations. Empirical results show that the resulting online algorithm, which can detect changes under a wide range of models, has a constant-per-iteration cost on average.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140168815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving model choice in classification: an approach based on clustering of covariance matrices","authors":"David Rodríguez-Vítores, Carlos Matrán","doi":"10.1007/s11222-024-10410-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10410-y","url":null,"abstract":"<p>This work introduces a refinement of the Parsimonious Model for fitting a Gaussian Mixture. The improvement is based on the consideration of clusters of the involved covariance matrices according to a criterion, such as sharing Principal Directions. This and other similarity criteria that arise from the spectral decomposition of a matrix are the bases of the Parsimonious Model. We show that such groupings of covariance matrices can be achieved through simple modifications of the CEM (Classification Expectation Maximization) algorithm. Our approach leads to propose Gaussian Mixture Models for model-based clustering and discriminant analysis, in which covariance matrices are clustered according to a parsimonious criterion, creating intermediate steps between the fourteen widely known parsimonious models. The added versatility not only allows us to obtain models with fewer parameters for fitting the data, but also provides greater interpretability. We show its usefulness for model-based clustering and discriminant analysis, providing algorithms to find approximate solutions verifying suitable size, shape and orientation constraints, and applying them to both simulation and real data examples.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Faïcel Chamroukhi, Nhat Thien Pham, Van Hà Hoang, Geoffrey J. McLachlan
{"title":"Functional mixtures-of-experts","authors":"Faïcel Chamroukhi, Nhat Thien Pham, Van Hà Hoang, Geoffrey J. McLachlan","doi":"10.1007/s11222-023-10379-0","DOIUrl":"https://doi.org/10.1007/s11222-023-10379-0","url":null,"abstract":"<p>We consider the statistical analysis of heterogeneous data for prediction, in situations where the observations include functions, typically time series. We extend the modeling with mixtures-of-experts (ME), as a framework of choice in modeling heterogeneity in data for prediction with vectorial observations, to this functional data analysis context. We first present a new family of ME models, named functional ME (FME), in which the predictors are potentially noisy observations, from entire functions. Furthermore, the data generating process of the predictor and the real response, is governed by a hidden discrete variable representing an unknown partition. Second, by imposing sparsity on derivatives of the underlying functional parameters via Lasso-like regularizations, we provide sparse and interpretable functional representations of the FME models called iFME. We develop dedicated expectation–maximization algorithms for Lasso-like regularized maximum-likelihood parameter estimation strategies to fit the models. The proposed models and algorithms are studied in simulated scenarios and in applications to two real data sets, and the obtained results demonstrate their performance in accurately capturing complex nonlinear relationships and in clustering the heterogeneous regression data.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ian Meneghel Danilevicz, Valdério Anselmo Reisen, Pascal Bondon
{"title":"Expectile and M-quantile regression for panel data","authors":"Ian Meneghel Danilevicz, Valdério Anselmo Reisen, Pascal Bondon","doi":"10.1007/s11222-024-10396-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10396-7","url":null,"abstract":"<p>Linear fixed effect models are a general way to fit panel or longitudinal data with a distinct intercept for each unit. Based on expectile and M-quantile approaches, we propose alternative regression estimation methods to estimate the parameters of linear fixed effect models. The estimation functions are penalized by the least absolute shrinkage and selection operator to reduce the dimensionality of the data. Some asymptotic properties of the estimators are established, and finite sample size investigations are conducted to verify the empirical performances of the estimation methods. The computational implementations of the procedures are discussed, and real economic panel data from the Organisation for Economic Cooperation and Development are analyzed to show the usefulness of the methods in a practical problem.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matrix regression heterogeneity analysis","authors":"Fengchuan Zhang, Sanguo Zhang, Shi-Ming Li, Mingyang Ren","doi":"10.1007/s11222-024-10401-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10401-z","url":null,"abstract":"<p>The development of modern science and technology has facilitated the collection of a large amount of matrix data in fields such as biomedicine. Matrix data modeling has been extensively studied, which advances from the naive approach of flattening the matrix into a vector. However, existing matrix modeling methods mainly focus on homogeneous data, failing to handle the data heterogeneity frequently encountered in the biomedical field, where samples from the same study belong to several underlying subgroups, and different subgroups follow different models. In this paper, we focus on regression-based heterogeneity analysis. We propose a matrix data heterogeneity analysis framework, by combining matrix bilinear sparse decomposition and penalized fusion techniques, which enables data-driven subgroup detection, including determining the number of subgroups and subgrouping membership. A rigorous theoretical analysis is conducted, including asymptotic consistency in terms of subgroup detection, the number of subgroups, and regression coefficients. Numerous numerical studies based on simulated and real data have been constructed, showcasing the superior performance of the proposed method in analyzing matrix heterogeneous data.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Doubly robust estimation of optimal treatment regimes for survival data using an instrumental variable","authors":"Xia Junwen, Zhan Zishu, Zhang Jingxiao","doi":"10.1007/s11222-024-10407-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10407-7","url":null,"abstract":"<p>In survival contexts, substantial literature exists on estimating optimal treatment regimes, where treatments are assigned based on personal characteristics to maximize the survival probability. These methods assume that a set of covariates is sufficient to deconfound the treatment-outcome relationship. However, this assumption can be limited in observational studies or randomized trials in which non-adherence occurs. Therefore, we propose a novel approach to estimating optimal treatment regimes when certain confounders are unobservable and a binary instrumental variable is available. Specifically, via a binary instrumental variable, we propose a semiparametric estimator for optimal treatment regimes by maximizing a Kaplan–Meier-like estimator of the survival function. Furthermore, to increase resistance to model misspecification, we construct novel doubly robust estimators. Since the estimators of the survival function are jagged, we incorporate kernel smoothing methods to improve performance. Under appropriate regularity conditions, the asymptotic properties are rigorously established. Moreover, the finite sample performance is evaluated through simulation studies. Finally, we illustrate our method using data from the National Cancer Institute’s prostate, lung, colorectal, and ovarian cancer screening trial.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140156495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantile ratio regression","authors":"Alessio Farcomeni, Marco Geraci","doi":"10.1007/s11222-024-10406-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10406-8","url":null,"abstract":"<p>We introduce quantile ratio regression. Our proposed model assumes that the ratio of two arbitrary quantiles of a continuous response distribution is a function of a linear predictor. Thanks to basic quantile properties, estimation can be carried out on the scale of either the response or the link function. The advantage of using the latter becomes tangible when implementing fast optimizers for linear regression in the presence of large datasets. We show the theoretical properties of the estimator and derive an efficient method to obtain standard errors. The good performance and merit of our methods are illustrated by means of a simulation study and a real data analysis; where we investigate income inequality in the European Union (EU) using data from a sample of about two million households. We find a significant association between inequality, as measured by quantile ratios, and certain macroeconomic indicators; and we identify countries with outlying income inequality relative to the rest of the EU. An <span>R</span> implementation of the proposed methods is freely available.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Janice L. Scealy, Kassel L. Hingee, John T. Kent, Andrew T. A. Wood
{"title":"Robust score matching for compositional data","authors":"Janice L. Scealy, Kassel L. Hingee, John T. Kent, Andrew T. A. Wood","doi":"10.1007/s11222-024-10412-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10412-w","url":null,"abstract":"<p>The restricted polynomially-tilted pairwise interaction (RPPI) distribution gives a flexible model for compositional data. It is particularly well-suited to situations where some of the marginal distributions of the components of a composition are concentrated near zero, possibly with right skewness. This article develops a method of tractable robust estimation for the model by combining two ideas. The first idea is to use score matching estimation after an additive log-ratio transformation. The resulting estimator is automatically insensitive to zeros in the data compositions. The second idea is to incorporate suitable weights in the estimating equations. The resulting estimator is additionally resistant to outliers. These properties are confirmed in simulation studies where we further also demonstrate that our new outlier-robust estimator is efficient in high concentration settings, even in the case when there is no model contamination. An example is given using microbiome data. A user-friendly R package accompanies the article.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140116841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantile generalized measures of correlation","authors":"Xinyu Zhang, Hongwei Shi, Niwen Zhou, Falong Tan, Xu Guo","doi":"10.1007/s11222-024-10414-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10414-8","url":null,"abstract":"<p>In this paper, we introduce a quantile Generalized Measure of Correlation (GMC) to describe nonlinear quantile relationship between response variable and predictors. The introduced correlation takes values between zero and one. It is zero if and only if the conditional quantile function is equal to the unconditional quantile. We also introduce a quantile partial Generalized Measure of Correlation. Estimators of these correlations are developed. Notably by adopting machine learning methods, our estimation procedures allow the dimension of predictors very large. Under mild conditions, we establish the estimators’ consistency. For construction of confidence interval, we adopt sample splitting and show that the corresponding estimators are asymptotic normal. We also consider composite quantile GMC by integrating information from different quantile levels. Numerical studies are conducted to illustrate our methods. Moreover, we apply our methods to analyze genome-wide association study data from Carworth Farms White mice.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140116788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}