{"title":"Differentially private estimation of weighted average treatment effects for binary outcomes","authors":"Sharmistha Guha , Jerome P. Reiter","doi":"10.1016/j.csda.2025.108145","DOIUrl":"10.1016/j.csda.2025.108145","url":null,"abstract":"<div><div>In the social and health sciences, researchers often make causal inferences using sensitive variables. These researchers, as well as the data holders themselves, may be ethically and perhaps legally obligated to protect the confidentiality of study participants' data. It is now known that releasing any statistics, including estimates of causal effects, computed with confidential data leaks information about the underlying data values. Thus, analysts may desire to use causal estimators that can provably bound this information leakage. Motivated by this goal, new algorithms are developed for estimating weighted average treatment effects with binary outcomes that satisfy the criterion of differential privacy. Theoretical results are presented on the accuracy of several differentially private estimators of weighted average treatment effects. Empirical evaluations using simulated data and a causal analysis involving education and income data illustrate the performance of these estimators.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108145"},"PeriodicalIF":1.5,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stratified distance space improves the efficiency of sequential samplers for approximate Bayesian computation","authors":"Henri Pesonen , Jukka Corander","doi":"10.1016/j.csda.2025.108141","DOIUrl":"10.1016/j.csda.2025.108141","url":null,"abstract":"<div><div>Approximate Bayesian computation (ABC) methods are standard tools for inferring parameters of complex models when the likelihood function is analytically intractable. A popular approach to improving the poor acceptance rate of the basic rejection sampling ABC algorithm is to use sequential Monte Carlo (ABC SMC) to produce a sequence of proposal distributions adapting towards the posterior, instead of generating values from the prior distribution of the model parameters. Proposal distribution for the subsequent iteration is typically obtained from a weighted set of samples, often called particles, of the current iteration of this sequence. Current methods for constructing these proposal distributions treat all the particles equivalently, regardless of the corresponding value generated by the sampler, which may lead to inefficiency when propagating the information across iterations of the algorithm. To improve sampler efficiency, a modified approach called stratified distance ABC SMC is introduced. The algorithm stratifies particles based on their distance between the corresponding synthetic and observed data, and then constructs distinct proposal distributions for all the strata. Taking into account the distribution of distances across the particle space leads to substantially improved acceptance rate of the rejection sampling. It is shown that further efficiency could be gained by using a newly proposed stopping rule for the sequential process based on the stratified posterior samples and these advances are demonstrated by several examples.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108141"},"PeriodicalIF":1.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikolai Spuck , Matthias Schmid , Malte Monin , Moritz Berger
{"title":"Confidence intervals for tree-structured varying coefficients","authors":"Nikolai Spuck , Matthias Schmid , Malte Monin , Moritz Berger","doi":"10.1016/j.csda.2025.108142","DOIUrl":"10.1016/j.csda.2025.108142","url":null,"abstract":"<div><div>The tree-structured varying coefficient (TSVC) model is a flexible regression approach that allows the effects of covariates to vary with the values of the effect modifiers. Relevant effect modifiers are identified inherently using recursive partitioning techniques. To quantify uncertainty in TSVC models, a procedure to construct confidence intervals of the estimated partition-specific coefficients is proposed. This task constitutes a selective inference problem as the coefficients of a TSVC model result from data-driven model building. To account for this issue, a parametric bootstrap approach, which is tailored to the complex structure of TSVC, is introduced. Finite sample properties, particularly coverage proportions, of the proposed confidence intervals are evaluated in a simulation study. For illustration, applications to data from COVID-19 patients and from patients suffering from acute odontogenic infection are considered. The proposed approach may also be adapted for constructing confidence intervals for other tree-based methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108142"},"PeriodicalIF":1.5,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient computation of sparse and robust maximum association estimators","authors":"Pia Pfeiffer , Andreas Alfons , Peter Filzmoser","doi":"10.1016/j.csda.2025.108133","DOIUrl":"10.1016/j.csda.2025.108133","url":null,"abstract":"<div><div>Robust statistical estimators offer resilience against outliers but are often computationally challenging, particularly in high-dimensional sparse settings. Modern optimization techniques are utilized for robust sparse association estimators without imposing constraints on the covariance structure. The approach splits the problem into a robust estimation phase, followed by optimization of a decoupled, biconvex problem to derive the sparse canonical vectors. An augmented Lagrangian algorithm, combined with a modified adaptive gradient descent method, induces sparsity through simultaneous updates of both canonical vectors. Results demonstrate improved precision over existing methods, with high-dimensional empirical examples illustrating the effectiveness of this approach. The methodology can also be extended to other robust sparse estimators.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108133"},"PeriodicalIF":1.5,"publicationDate":"2025-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Functional time transformation model with applications to digital health","authors":"Rahul Ghosal , Marcos Matabuena , Sujit K. Ghosh","doi":"10.1016/j.csda.2025.108131","DOIUrl":"10.1016/j.csda.2025.108131","url":null,"abstract":"<div><div>The advent of wearable and sensor technologies now leads to functional predictors which are intrinsically infinite dimensional. While the existing approaches for functional data and survival outcomes lean on the well-established Cox model, the proportional hazard (PH) assumption might not always be suitable in real-world applications. Motivated by physiological signals encountered in digital medicine, we develop a more general and flexible functional time-transformation model for estimating the conditional survival function with both functional and scalar covariates. A partially functional regression model is used to directly model the survival time on the covariates through an unknown monotone transformation and a known error distribution. We use Bernstein polynomials to model the monotone transformation function and the smooth functional coefficients. A sieve method of maximum likelihood is employed for estimation. Numerical simulations illustrate a satisfactory performance of the proposed method in estimation and inference. We demonstrate the application of the proposed model through two case studies involving wearable data i) Understanding the association between diurnal physical activity pattern and all-cause mortality based on accelerometer data from the National Health and Nutrition Examination Survey (NHANES) 2011-2014 and ii) Modelling Time-to-Hypoglycemia events in a cohort of diabetic patients based on distributional representation of continuous glucose monitoring (CGM) data. The results provide important epidemiological insights into the direct association between survival times and the physiological signals and also exhibit superior predictive performance compared to traditional summary-based biomarkers in the CGM study.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"207 ","pages":"Article 108131"},"PeriodicalIF":1.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingnan Tai , Li Tao , Jianxin Pan , Man-lai Tang , Keming Yu , Wolfgang Karl Härdle , Maozai Tian
{"title":"Fully nonparametric inverse probability weighting estimation with nonignorable missing data and its extension to missing quantile regression","authors":"Lingnan Tai , Li Tao , Jianxin Pan , Man-lai Tang , Keming Yu , Wolfgang Karl Härdle , Maozai Tian","doi":"10.1016/j.csda.2025.108127","DOIUrl":"10.1016/j.csda.2025.108127","url":null,"abstract":"<div><div>In practical data analysis, the not-missing-at-random (NMAR) mechanism is typically more aligned with the natural causes of missing data. The NMAR mechanism is complicated and adaptable, surpassing the capabilities of classical methods in addressing this missing data challenge. A comprehensive analysis framework for the NMAR problem is established, and a novel inverse probability weighting method based on the fully nonparametric exponential tilting model and sieve minimum distance is constructed. Additionally, given the broad field of applications for the quantile regression model, fully nonparametric inverse probability weighting and augmented inverse probability weighting for estimating quantile regression under NMAR are introduced. Simulation studies demonstrate that the proposed methods are better suited for various flexible propensity score functions. In practical applications, our methods are applied to the AIDS Clinical Trials Group Study 175 data to examine the effectiveness of treatments on HIV-infected subjects.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108127"},"PeriodicalIF":1.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantile feature screening for infinite dimensional data under FDR control","authors":"Zhentao Tian, Zhongzhan Zhang","doi":"10.1016/j.csda.2025.108132","DOIUrl":"10.1016/j.csda.2025.108132","url":null,"abstract":"<div><div>This study is focused on the detection of effects of features on an infinite dimensional response through the conditional spatial quantiles (CSQ) of the response given the features, and develops a novel model-free feature screening procedure for the CSQ regression function. Firstly, a new metric named kernel-based conditional quantile dependence (KCQD) is proposed to measure the dependence of the CSQ on a feature. The metric equals 0 if and only if the feature is independent of the CSQ of the response, and thus is employed to detect the contribution of a feature. Then a two-step feature screening procedure with the estimated KCQD scores is developed via a distributed strategy. Theoretical analyses reveal that the new two-step screening method not only has screening consistency and sure screening properties but also achieves control over false discovery rate (FDR). Simulation studies show its ability to control the expected FDR level while maintaining high screening power. The proposed procedure is applied to analyze a magnetoencephalography dataset, and the identified signal positions are anatomically interpretable.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108132"},"PeriodicalIF":1.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Massimo Bilancia , Antonio Giovanni Solimando , Fabio Manca , Angelo Vacca , Roberto Ria
{"title":"A Markov model for estimating the cost-effectiveness of immunotherapy for newly diagnosed multiple myeloma patients","authors":"Massimo Bilancia , Antonio Giovanni Solimando , Fabio Manca , Angelo Vacca , Roberto Ria","doi":"10.1016/j.csda.2025.108130","DOIUrl":"10.1016/j.csda.2025.108130","url":null,"abstract":"<div><div>Multiple myeloma (MM) is a malignancy of plasma cells, originating from B lymphocytes and accumulating within the bone marrow. The prevalence of MM has increased in industrialized countries, representing 1-1.8% of all cancers and 15% of hematologic malignancies. Immunotherapy has broadened therapeutic options for MM, offering treatments with generally improved efficacy and reduced toxicity compared to conventional therapies. Daratumumab, a monoclonal antibody recently granted regulatory approval, exemplifies this advancement, demonstrating improved patient outcomes. However, the substantial cost of daratumumab has significantly increased per-patient treatment expenditures. Consequently, the economic burden associated with this new class of therapies warrants careful evaluation of their cost-effectiveness. To address this, a six-state non-stationary Markov model was developed for cost-effectiveness analysis of immunotherapy in newly diagnosed MM patients and, more broadly, in the oncohematological patient population. This model aims to provide healthcare professionals and policymakers with actionable insights into cost-effective interventions, supporting informed decisions regarding optimal treatment strategies.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108130"},"PeriodicalIF":1.5,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extremal local linear quantile regression for nonlinear dependent processes","authors":"Fengyang He , Huixia Judy Wang","doi":"10.1016/j.csda.2025.108128","DOIUrl":"10.1016/j.csda.2025.108128","url":null,"abstract":"<div><div>Estimating extreme conditional quantiles accurately in the presence of data sparsity in the tails is a challenging and important problem. While there is existing literature on quantile analysis, limited work has been done on capturing nonlinear relationships in dependent data structures for extreme quantile estimation. They propose a novel estimation procedure that combines the local linear quantile regression method and extreme value theory. They develop a new enhanced Hill estimator for the conditional extreme value index, constructed based on the local linear quantile estimators at a sequence of quantile levels. That approach allows for data-adaptive weights assigned to different quantiles, providing flexibility and potential for enhancing estimation efficiency. Furthermore, they propose an estimator for extreme conditional quantiles by extrapolating from the intermediate quantiles. Their methodology enables both point and interval estimation of extreme conditional quantiles for processes with an <em>α</em>-mixing dependence structure. They derive the Bahadur representation of the intermediate quantile estimators within the local linear extreme-quantile framework and establish the asymptotic properties of their proposed estimators. Simulation studies and real data analysis are conducted to demonstrate the effectiveness and performance of their methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108128"},"PeriodicalIF":1.5,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeneity-aware transfer learning for high-dimensional linear regression models","authors":"Yanjin Peng, Lei Wang","doi":"10.1016/j.csda.2025.108129","DOIUrl":"10.1016/j.csda.2025.108129","url":null,"abstract":"<div><div>Transfer learning can refine the performance of a target model through utilizing beneficial information from relevant source datasets. In practice, however, auxiliary samples may be collected from different sub-populations with non-negligible heterogeneity. In this paper we assume that each dataset involves a common parameter vector and dataset-specific nuisance parameters and extend the transfer learning framework to account for heterogeneous models. Specifically, we adapt the decorrelated score technique to deal with the dataset-specific nuisance parameters and develop a strategy to leverage possible shared information from relevant source datasets. To avoid negative transfer, a completely data-driven algorithm is provided to determine the transferable sources. The convergence rate of the proposed estimator is investigated and the source detection consistency is also verified. Extensive numerical experiments are conducted to evaluate the proposed transfer learning algorithms, and an application to the Genotype-Tissue Expression dataset is exhibited.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108129"},"PeriodicalIF":1.5,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}