Lingnan Tai , Li Tao , Jianxin Pan , Man-lai Tang , Keming Yu , Wolfgang Karl Härdle , Maozai Tian
{"title":"Fully nonparametric inverse probability weighting estimation with nonignorable missing data and its extension to missing quantile regression","authors":"Lingnan Tai , Li Tao , Jianxin Pan , Man-lai Tang , Keming Yu , Wolfgang Karl Härdle , Maozai Tian","doi":"10.1016/j.csda.2025.108127","DOIUrl":"10.1016/j.csda.2025.108127","url":null,"abstract":"<div><div>In practical data analysis, the not-missing-at-random (NMAR) mechanism is typically more aligned with the natural causes of missing data. The NMAR mechanism is complicated and adaptable, surpassing the capabilities of classical methods in addressing this missing data challenge. A comprehensive analysis framework for the NMAR problem is established, and a novel inverse probability weighting method based on the fully nonparametric exponential tilting model and sieve minimum distance is constructed. Additionally, given the broad field of applications for the quantile regression model, fully nonparametric inverse probability weighting and augmented inverse probability weighting for estimating quantile regression under NMAR are introduced. Simulation studies demonstrate that the proposed methods are better suited for various flexible propensity score functions. In practical applications, our methods are applied to the AIDS Clinical Trials Group Study 175 data to examine the effectiveness of treatments on HIV-infected subjects.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108127"},"PeriodicalIF":1.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantile feature screening for infinite dimensional data under FDR control","authors":"Zhentao Tian, Zhongzhan Zhang","doi":"10.1016/j.csda.2025.108132","DOIUrl":"10.1016/j.csda.2025.108132","url":null,"abstract":"<div><div>This study is focused on the detection of effects of features on an infinite dimensional response through the conditional spatial quantiles (CSQ) of the response given the features, and develops a novel model-free feature screening procedure for the CSQ regression function. Firstly, a new metric named kernel-based conditional quantile dependence (KCQD) is proposed to measure the dependence of the CSQ on a feature. The metric equals 0 if and only if the feature is independent of the CSQ of the response, and thus is employed to detect the contribution of a feature. Then a two-step feature screening procedure with the estimated KCQD scores is developed via a distributed strategy. Theoretical analyses reveal that the new two-step screening method not only has screening consistency and sure screening properties but also achieves control over false discovery rate (FDR). Simulation studies show its ability to control the expected FDR level while maintaining high screening power. The proposed procedure is applied to analyze a magnetoencephalography dataset, and the identified signal positions are anatomically interpretable.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108132"},"PeriodicalIF":1.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Massimo Bilancia , Antonio Giovanni Solimando , Fabio Manca , Angelo Vacca , Roberto Ria
{"title":"A Markov model for estimating the cost-effectiveness of immunotherapy for newly diagnosed multiple myeloma patients","authors":"Massimo Bilancia , Antonio Giovanni Solimando , Fabio Manca , Angelo Vacca , Roberto Ria","doi":"10.1016/j.csda.2025.108130","DOIUrl":"10.1016/j.csda.2025.108130","url":null,"abstract":"<div><div>Multiple myeloma (MM) is a malignancy of plasma cells, originating from B lymphocytes and accumulating within the bone marrow. The prevalence of MM has increased in industrialized countries, representing 1-1.8% of all cancers and 15% of hematologic malignancies. Immunotherapy has broadened therapeutic options for MM, offering treatments with generally improved efficacy and reduced toxicity compared to conventional therapies. Daratumumab, a monoclonal antibody recently granted regulatory approval, exemplifies this advancement, demonstrating improved patient outcomes. However, the substantial cost of daratumumab has significantly increased per-patient treatment expenditures. Consequently, the economic burden associated with this new class of therapies warrants careful evaluation of their cost-effectiveness. To address this, a six-state non-stationary Markov model was developed for cost-effectiveness analysis of immunotherapy in newly diagnosed MM patients and, more broadly, in the oncohematological patient population. This model aims to provide healthcare professionals and policymakers with actionable insights into cost-effective interventions, supporting informed decisions regarding optimal treatment strategies.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108130"},"PeriodicalIF":1.5,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extremal local linear quantile regression for nonlinear dependent processes","authors":"Fengyang He , Huixia Judy Wang","doi":"10.1016/j.csda.2025.108128","DOIUrl":"10.1016/j.csda.2025.108128","url":null,"abstract":"<div><div>Estimating extreme conditional quantiles accurately in the presence of data sparsity in the tails is a challenging and important problem. While there is existing literature on quantile analysis, limited work has been done on capturing nonlinear relationships in dependent data structures for extreme quantile estimation. They propose a novel estimation procedure that combines the local linear quantile regression method and extreme value theory. They develop a new enhanced Hill estimator for the conditional extreme value index, constructed based on the local linear quantile estimators at a sequence of quantile levels. That approach allows for data-adaptive weights assigned to different quantiles, providing flexibility and potential for enhancing estimation efficiency. Furthermore, they propose an estimator for extreme conditional quantiles by extrapolating from the intermediate quantiles. Their methodology enables both point and interval estimation of extreme conditional quantiles for processes with an <em>α</em>-mixing dependence structure. They derive the Bahadur representation of the intermediate quantile estimators within the local linear extreme-quantile framework and establish the asymptotic properties of their proposed estimators. Simulation studies and real data analysis are conducted to demonstrate the effectiveness and performance of their methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108128"},"PeriodicalIF":1.5,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeneity-aware transfer learning for high-dimensional linear regression models","authors":"Yanjin Peng, Lei Wang","doi":"10.1016/j.csda.2025.108129","DOIUrl":"10.1016/j.csda.2025.108129","url":null,"abstract":"<div><div>Transfer learning can refine the performance of a target model through utilizing beneficial information from relevant source datasets. In practice, however, auxiliary samples may be collected from different sub-populations with non-negligible heterogeneity. In this paper we assume that each dataset involves a common parameter vector and dataset-specific nuisance parameters and extend the transfer learning framework to account for heterogeneous models. Specifically, we adapt the decorrelated score technique to deal with the dataset-specific nuisance parameters and develop a strategy to leverage possible shared information from relevant source datasets. To avoid negative transfer, a completely data-driven algorithm is provided to determine the transferable sources. The convergence rate of the proposed estimator is investigated and the source detection consistency is also verified. Extensive numerical experiments are conducted to evaluate the proposed transfer learning algorithms, and an application to the Genotype-Tissue Expression dataset is exhibited.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108129"},"PeriodicalIF":1.5,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nadia L. Kudraszow , Alejandra V. Vahnovan , Julieta Ferrario , M. Victoria Fasano
{"title":"Robust generalized canonical correlation analysis based on scatter matrices","authors":"Nadia L. Kudraszow , Alejandra V. Vahnovan , Julieta Ferrario , M. Victoria Fasano","doi":"10.1016/j.csda.2025.108126","DOIUrl":"10.1016/j.csda.2025.108126","url":null,"abstract":"<div><div>Generalized Canonical Correlation Analysis (GCCA) is a powerful tool for analyzing and understanding linear relationships between multiple sets of variables. However, its classical estimations are highly sensitive to outliers, which can significantly affect the results of the analysis. A functional version of GCCA is proposed, based on scatter matrices, leading to robust and Fisher consistent estimators for appropriate choices of the scatter matrix. In cases where scatter matrices are ill-conditioned, a modification based on an estimation of the precision matrix is introduced. A procedure to identify influential observations is also developed. A simulation study evaluates the finite-sample performance of the proposed methods under clean and contaminated samples. The advantages of the influential data detection approach are demonstrated through an application to a real dataset.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108126"},"PeriodicalIF":1.5,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient and distribution-free symmetry test for high-dimensional data based on energy statistics and random projections","authors":"Bo Chen , Feifei Chen , Junxin Wang , Tao Qiu","doi":"10.1016/j.csda.2024.108123","DOIUrl":"10.1016/j.csda.2024.108123","url":null,"abstract":"<div><div>Testing the departures from symmetry is a critical issue in statistics. Over the last two decades, substantial effort has been invested in developing tests for central symmetry in multivariate and high-dimensional contexts. Traditional tests, which rely on Euclidean distance, face significant challenges in high-dimensional data. These tests struggle to capture overall central symmetry and are often limited to verifying whether the distribution's center aligns with the coordinate origin, a problem exacerbated by the “curse of dimensionality.” Furthermore, they tend to be computationally intensive, often making them impractical for large datasets. To overcome these limitations, we propose a nonparametric test based on the random projected energy distance, extending the energy distance test through random projections. This method effectively reduces data dimensions by projecting high-dimensional data onto lower-dimensional spaces, with the randomness ensuring maximum preservation of information. Theoretically, as the number of random projections approaches infinity, the risk of power loss from inadequate directions is mitigated. Leveraging <em>U</em>-statistic theory, our test's asymptotic null distribution is standard normal, which holds true regardless of the data dimensionality relative to sample size, thus eliminating the need for re-sampling to determine critical values. For computational efficiency with large datasets, we adopt a divide-and-average strategy, partitioning the data into smaller blocks of size <em>m</em>. Within each block, the estimates of the random projected energy distance are normally distributed. By averaging these estimates across all blocks, we derive a test statistic that is asymptotically standard normal. This method significantly reduces computational complexity from quadratic to linear in sample size, enhancing the feasibility of our test for extensive data analysis. Through extensive numerical studies, we have demonstrated the robust empirical performance of our test in terms of size and power, affirming its practical utility in statistical applications for high-dimensional data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108123"},"PeriodicalIF":1.5,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alfonso Landeros , Seyoon Ko , Jack Z. Chang , Tong Tong Wu , Kenneth Lange
{"title":"Sparse vertex discriminant analysis: Variable selection for biomedical classification applications","authors":"Alfonso Landeros , Seyoon Ko , Jack Z. Chang , Tong Tong Wu , Kenneth Lange","doi":"10.1016/j.csda.2025.108125","DOIUrl":"10.1016/j.csda.2025.108125","url":null,"abstract":"<div><div>Modern biomedical datasets are often high-dimensional at multiple levels of biological organization. Practitioners must therefore grapple with data to estimate sparse or low-rank structures so as to adhere to the principle of parsimony. Further complicating matters is the presence of groups in data, each of which may have distinct associations with explanatory variables or be characterized by fundamentally different covariates. These themes in data analysis are explored in the context of classification. Vertex Discriminant Analysis (VDA) offers flexible linear and nonlinear models for classification that generalize the advantages of support vector machines to data with multiple classes. The proximal distance principle, which leverages projection and proximal operators in the design of practical algorithms, handily facilitates variable selection in VDA via nonconvex distance-to-set penalties directly controlling the number of active variables. Two flavors of sparse VDA are developed to address data in which instances may be homogeneous or heterogeneous with respect to predictors characterizing classes. Empirical studies illustrate how VDA is adapted to class-specific variable selection on simulated and real datasets, with an emphasis on applications to cancer classification via gene expression patterns.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108125"},"PeriodicalIF":1.5,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Component selection and variable selection for mixture regression models","authors":"Xuefei Qi , Xingbai Xu , Zhenghui Feng , Heng Peng","doi":"10.1016/j.csda.2024.108124","DOIUrl":"10.1016/j.csda.2024.108124","url":null,"abstract":"<div><div>Finite mixture regression models are commonly used to account for heterogeneity in populations and situations where the assumptions required for standard regression models may not hold. To expand the range of applicable distributions for components beyond the Gaussian distribution, other distributions, such as the exponential power distribution, the skew-normal distribution, and so on, are explored. To enable simultaneous model estimation, order selection, and variable selection, a penalized likelihood estimation approach that imposes penalties on both the mixing proportions and regression coefficients, which we call the double-penalized likelihood method is proposed in this paper. Four double-penalized likelihood functions and their performance are studied. The consistency of estimators, order selection, and variable selection are investigated. A modified expectation–maximization algorithm is proposed to implement the double-penalized likelihood method. Numerical simulations demonstrate the effectiveness of our proposed method and algorithm. Finally, the results of real data analysis are presented to illustrate the application of our approach. Overall, our study contributes to the development of mixture regression models and provides a useful tool for model and variable selection.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108124"},"PeriodicalIF":1.5,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mean shift-based clustering for misaligned functional data","authors":"Andrew Welbaum, Wanli Qiao","doi":"10.1016/j.csda.2024.108107","DOIUrl":"10.1016/j.csda.2024.108107","url":null,"abstract":"<div><div>Misalignment often occurs in functional data and can severely impact their clustering results. A clustering algorithm for misaligned functional data is developed, by adapting the original mean shift algorithm in the Euclidean space. This mean shift algorithm is applied to the quotient space of the orbits of the square root velocity functions induced by the misaligned functional data, in which the elastic distance is equipped. Convergence properties of this algorithm are studied. The efficacy of the algorithm is demonstrated through simulations and various real data applications.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108107"},"PeriodicalIF":1.5,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}