{"title":"Doubly robust estimation of optimal treatment regimes for survival data using an instrumental variable","authors":"Xia Junwen, Zhan Zishu, Zhang Jingxiao","doi":"10.1007/s11222-024-10407-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10407-7","url":null,"abstract":"<p>In survival contexts, substantial literature exists on estimating optimal treatment regimes, where treatments are assigned based on personal characteristics to maximize the survival probability. These methods assume that a set of covariates is sufficient to deconfound the treatment-outcome relationship. However, this assumption can be limited in observational studies or randomized trials in which non-adherence occurs. Therefore, we propose a novel approach to estimating optimal treatment regimes when certain confounders are unobservable and a binary instrumental variable is available. Specifically, via a binary instrumental variable, we propose a semiparametric estimator for optimal treatment regimes by maximizing a Kaplan–Meier-like estimator of the survival function. Furthermore, to increase resistance to model misspecification, we construct novel doubly robust estimators. Since the estimators of the survival function are jagged, we incorporate kernel smoothing methods to improve performance. Under appropriate regularity conditions, the asymptotic properties are rigorously established. Moreover, the finite sample performance is evaluated through simulation studies. Finally, we illustrate our method using data from the National Cancer Institute’s prostate, lung, colorectal, and ovarian cancer screening trial.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"153 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140156495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantile ratio regression","authors":"Alessio Farcomeni, Marco Geraci","doi":"10.1007/s11222-024-10406-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10406-8","url":null,"abstract":"<p>We introduce quantile ratio regression. Our proposed model assumes that the ratio of two arbitrary quantiles of a continuous response distribution is a function of a linear predictor. Thanks to basic quantile properties, estimation can be carried out on the scale of either the response or the link function. The advantage of using the latter becomes tangible when implementing fast optimizers for linear regression in the presence of large datasets. We show the theoretical properties of the estimator and derive an efficient method to obtain standard errors. The good performance and merit of our methods are illustrated by means of a simulation study and a real data analysis; where we investigate income inequality in the European Union (EU) using data from a sample of about two million households. We find a significant association between inequality, as measured by quantile ratios, and certain macroeconomic indicators; and we identify countries with outlying income inequality relative to the rest of the EU. An <span>R</span> implementation of the proposed methods is freely available.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"23 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Janice L. Scealy, Kassel L. Hingee, John T. Kent, Andrew T. A. Wood
{"title":"Robust score matching for compositional data","authors":"Janice L. Scealy, Kassel L. Hingee, John T. Kent, Andrew T. A. Wood","doi":"10.1007/s11222-024-10412-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10412-w","url":null,"abstract":"<p>The restricted polynomially-tilted pairwise interaction (RPPI) distribution gives a flexible model for compositional data. It is particularly well-suited to situations where some of the marginal distributions of the components of a composition are concentrated near zero, possibly with right skewness. This article develops a method of tractable robust estimation for the model by combining two ideas. The first idea is to use score matching estimation after an additive log-ratio transformation. The resulting estimator is automatically insensitive to zeros in the data compositions. The second idea is to incorporate suitable weights in the estimating equations. The resulting estimator is additionally resistant to outliers. These properties are confirmed in simulation studies where we further also demonstrate that our new outlier-robust estimator is efficient in high concentration settings, even in the case when there is no model contamination. An example is given using microbiome data. A user-friendly R package accompanies the article.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"1 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140116841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantile generalized measures of correlation","authors":"Xinyu Zhang, Hongwei Shi, Niwen Zhou, Falong Tan, Xu Guo","doi":"10.1007/s11222-024-10414-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10414-8","url":null,"abstract":"<p>In this paper, we introduce a quantile Generalized Measure of Correlation (GMC) to describe nonlinear quantile relationship between response variable and predictors. The introduced correlation takes values between zero and one. It is zero if and only if the conditional quantile function is equal to the unconditional quantile. We also introduce a quantile partial Generalized Measure of Correlation. Estimators of these correlations are developed. Notably by adopting machine learning methods, our estimation procedures allow the dimension of predictors very large. Under mild conditions, we establish the estimators’ consistency. For construction of confidence interval, we adopt sample splitting and show that the corresponding estimators are asymptotic normal. We also consider composite quantile GMC by integrating information from different quantile levels. Numerical studies are conducted to illustrate our methods. Moreover, we apply our methods to analyze genome-wide association study data from Carworth Farms White mice.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140116788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian variable selection for matrix autoregressive models","authors":"Alessandro Celani, Paolo Pagnottoni, Galin Jones","doi":"10.1007/s11222-024-10402-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10402-y","url":null,"abstract":"<p>A Bayesian method is proposed for variable selection in high-dimensional matrix autoregressive models which reflects and exploits the original matrix structure of data to (a) reduce dimensionality and (b) foster interpretability of multidimensional relationship structures. A compact form of the model is derived which facilitates the estimation procedure and two computational methods for the estimation are proposed: a Markov chain Monte Carlo algorithm and a scalable Bayesian EM algorithm. Being based on the spike-and-slab framework for fast posterior mode identification, the latter enables Bayesian data analysis of matrix-valued time series at large scales. The theoretical properties, comparative performance, and computational efficiency of the proposed model is investigated through simulated examples and an application to a panel of country economic indicators.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"1 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140098943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-scale correlation screening under dependence for brain functional connectivity network inference","authors":"Hanâ Lbath, Alexander Petersen, Sophie Achard","doi":"10.1007/s11222-024-10411-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10411-x","url":null,"abstract":"<p>Data produced by resting-state functional Magnetic Resonance Imaging are widely used to infer brain functional connectivity networks. Such networks correlate neural signals to connect brain regions, which consist in groups of dependent voxels. Previous work has focused on aggregating data across voxels within predefined regions. However, the presence of within-region correlations has noticeable impacts on inter-regional correlation detection, and thus edge identification. To alleviate them, we propose to leverage techniques from the large-scale correlation screening literature, and derive simple and practical characterizations of the mean number of correlation discoveries that flexibly incorporate intra-regional dependence structures. A connectivity network inference framework is then presented. First, inter-regional correlation distributions are estimated. Then, correlation thresholds that can be tailored to one’s application are constructed for each edge. Finally, the proposed framework is implemented on synthetic and real-world datasets. This novel approach for handling arbitrary intra-regional correlation is shown to limit false positives while improving true positive rates.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"5 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140076119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple-output quantile regression neural network","authors":"Ruiting Hao, Xiaorong Yang","doi":"10.1007/s11222-024-10408-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10408-6","url":null,"abstract":"<p>Quantile regression neural network (QRNN) model has received increasing attention in various fields to provide conditional quantiles of responses. However, almost all the available literature about QRNN is devoted to handling the case with one-dimensional responses, which presents a great limitation when we focus on the quantiles of multivariate responses. To deal with this issue, we propose a novel multiple-output quantile regression neural network (MOQRNN) model in this paper to estimate the conditional quantiles of multivariate data. The MOQRNN model is constructed by the following steps. Step 1 acquires the conditional distribution of multivariate responses by a nonparametric method. Step 2 obtains the optimal transport map that pushes the spherical uniform distribution forward to the conditional distribution through the input convex neural network (ICNN). Step 3 provides the conditional quantile contours and regions by the ICNN-based optimal transport map. In both simulation studies and real data application, comparative analyses with the existing method demonstrate that the proposed MOQRNN model is more appealing to yield excellent quantile contours, which are not only smoother but also closer to their theoretical counterparts.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"281 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140076024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Total effects with constrained features","authors":"","doi":"10.1007/s11222-024-10398-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10398-5","url":null,"abstract":"<h3>Abstract</h3> <p>Recent studies have emphasized the connection between machine learning feature importance measures and total order sensitivity indices (total effects, henceforth). Feature correlations and the need to avoid unrestricted permutations make the estimation of these indices challenging. Additionally, there is no established theory or approach for non-Cartesian domains. We propose four alternative strategies for computing total effects that account for both dependent and constrained features. Our first approach involves a generalized winding stairs design combined with the Knothe-Rosenblatt transformation. This approach, while applicable to a wide family of input dependencies, becomes impractical when inputs are physically constrained. Our second approach is a U-statistic that combines the Jansen estimator with a weighting factor. The U-statistic framework allows the derivation of a central limit theorem for this estimator. However, this design is computationally intensive. Then, our third approach uses derangements to significantly reduce computational burden. We prove consistency and central limit theorems for these estimators as well. Our fourth approach is based on a nearest-neighbour intuition and it further reduces computational burden. We test these estimators through a series of increasingly complex computational experiments with features constrained on compact and connected domains (circle, simplex), non-compact and non-connected domains (Sierpinski gaskets), we provide comparisons with machine learning approaches and conclude with an application to a realistic simulator.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"8 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140035815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of regime-switching diffusions via Fourier transforms","authors":"Thomas Lux","doi":"10.1007/s11222-024-10397-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10397-6","url":null,"abstract":"<p>In this article, an algorithm for maximum-likelihood estimation of regime-switching diffusions is proposed. The proposed approach uses a Fourier transform to numerically solve the system of Fokker–Planck or forward Kolmogorow equations for the temporal evolution of the state densities. Monte Carlo simulations confirm the theoretically expected consistency of this approach for moderate sample sizes and its practical feasibility for certain regime-switching diffusions used in economics and biology with moderate numbers of states and parameters. An application to animal movement data serves as an illustration of the proposed algorithm.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"10 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140035718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-dimensional sparse single–index regression via Hilbert–Schmidt independence criterion","authors":"Xin Chen, Chang Deng, Shuaida He, Runxiong Wu, Jia Zhang","doi":"10.1007/s11222-024-10399-4","DOIUrl":"https://doi.org/10.1007/s11222-024-10399-4","url":null,"abstract":"<p>Hilbert-Schmidt Independence Criterion (HSIC) has recently been introduced to the field of single-index models to estimate the directions. Compared with other well-established methods, the HSIC based method requires relatively weak conditions. However, its performance has not yet been studied in the prevalent high-dimensional scenarios, where the number of covariates can be much larger than the sample size. In this article, based on HSIC, we propose to estimate the possibly sparse directions in the high-dimensional single-index models through a parameter reformulation. Our approach estimates the subspace of the direction directly and performs variable selection simultaneously. Due to the non-convexity of the objective function and the complexity of the constraints, a majorize-minimize algorithm together with the linearized alternating direction method of multipliers is developed to solve the optimization problem. Since it does not involve the inverse of the covariance matrix, the algorithm can naturally handle large <i>p</i> small <i>n</i> scenarios. Through extensive simulation studies and a real data analysis, we show that our proposal is efficient and effective in the high-dimensional settings. The <span>(texttt {Matlab})</span> codes for this method are available online.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"6 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140005016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}