Maomao Ding, Jing Ning, Xuming He, Anne-Marie Wills, Ruosha Li
{"title":"Longitudinal Modeling of Rank-based Global Outcome.","authors":"Maomao Ding, Jing Ning, Xuming He, Anne-Marie Wills, Ruosha Li","doi":"10.5705/ss.202023.0049","DOIUrl":"10.5705/ss.202023.0049","url":null,"abstract":"<p><p>Many chronic diseases exhibit multifaceted symptoms that cannot be comprehensively characterized by one outcome. To address this, researchers often adopt a global outcome to combine information from multiple individual outcomes. The global rank-sum facilitates robust integration of multiple outcomes and has been applied in many clinical studies. We consider longitudinal settings and devise a global percentile outcome for depicting patients' time-varying global disease burden. We develop useful regression strategies for the longitudinal global percentile outcome based on a flexible regression framework of the monotonic index model. Posing minimal restrictions, we propose a maximum rank correlation type estimator and show that it entails desirable asymptotic properties. The methods are also extended to accommodate the common missing at random dropout scenarios. We propose a computationally stable and efficient procedure for parameter estimation, as well as a perturbation scheme for consistent variance estimation. Numerical studies show that our method performs well under realistic settings. We apply the proposed method to data from a Parkinson's disease clinical trial to examine risk factors associated with elevated global disease burden and accelerated disease progression.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"36 2","pages":"649-668"},"PeriodicalIF":1.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13060656/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147647325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yueying Wang, Guannan Wang, Brandon Klinedinst, Auriel Willette, Li Wang
{"title":"STATISTICAL INFERENCE FOR MEAN FUNCTIONS OF COMPLEX 3D OBJECTS.","authors":"Yueying Wang, Guannan Wang, Brandon Klinedinst, Auriel Willette, Li Wang","doi":"10.5705/ss.202023.0071","DOIUrl":"10.5705/ss.202023.0071","url":null,"abstract":"<p><p>The use of complex three-dimensional (3D) objects is growing in various applications as data collection techniques continue to evolve. Identifying and locating significant effects within these objects is essential for making informed decisions based on the data. This article presents an advanced nonparametric method for learning and inferring complex 3D objects, enabling accurate estimation of the underlying signals and efficient detection and localization of significant effects. The proposed method addresses the problem of analyzing irregular-shaped 3D objects by modeling them as functional data and utilizing trivariate spline smoothing based on triangulations to estimate the underlying signals. We develop a highly efficient procedure that accurately estimates the mean and covariance functions, as well as the eigenvalues and eigenfunctions. Furthermore, we rigorously establish the asymptotic properties of these estimators. Additionally, a novel approach for constructing simultaneous confidence corridors to quantify estimation uncertainty is presented, and the procedure is extended to accommodate comparisons between two independent samples. The finite-sample performance of the proposed methods is illustrated through numerical experiments and a real-data application using the Alzheimer's Disease Neuroimaging Initiative database.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":"1451-1477"},"PeriodicalIF":1.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12419769/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70939966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"COMMUNITY EXTRACTION OF NETWORK DATA UNDER STOCHASTIC BLOCK MODELS.","authors":"Quan Yuan, Binghui Liu, Danning Li, Yanyuan Ma","doi":"10.5705/ss.202022.0372","DOIUrl":"10.5705/ss.202022.0372","url":null,"abstract":"<p><p>Most existing community discovery methods focus on partitioning all nodes of the network into communities. However, many real networks contain background nodes that do not belong to any community. In such a situation, typical methods tend to artificially split the background nodes and group them together with communities with relatively stronger connection, hence lead to distorted results. To avoid this, some community extraction methods have been developed to achieve community discovery with background nodes, which are based on searching algorithms, hence have difficulties in handling large-scale networks due to high computational complexity. To this end, in this paper we propose some algorithms with polynomial complexity to achieve community extraction of large-scale networks. We rigorously show that the proposed algorithms have attractive theoretical properties. In particular, the estimators of the community labels using the proposed algorithms reaches the asymptotic minimax risk under the community extraction model, a specific stochastic block model. Then, we illustrate the advantages and feasibility of the proposed algorithms via extensive simulated networks and a political blog network.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"35 SI 2","pages":"1789-1809"},"PeriodicalIF":1.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13008304/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147516180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-dimensional Subgroup Regression Analysis.","authors":"Fei Jiang, Lu Tian, Jian Kang, Lexin Li","doi":"10.5705/ss.202023.0075","DOIUrl":"10.5705/ss.202023.0075","url":null,"abstract":"<p><p>Classical regression generally assumes that all subjects follow a common model with the same set of parameters. With ever advancing capabilities of modern technologies to collect more subjects and more covariates, it has become increasingly common that there exist subgroups of subjects, and each group follows a different regression model with a different set of parameters. In this article, we propose a new approach for subgroup analysis in regression modeling. Specifically, we model the relation between a response and a set of primary predictors, while we explicitly model the heterogenous association given another set of auxiliary predictors, through the interaction between the primary and auxiliary variables. We introduce penalties to induce the sparsity and group structures within the regression coefficients, and to achieve simultaneous feature selection for both primary predictors that are significantly associated with the response, as well as the auxiliary predictors that define the subgroups. We establish the asymptotic guarantees in terms of parameter estimation consistency and cluster estimation consistency. We illustrate our method with an analysis of the functional magnetic resonance imaging data from the Adolescent Brain Cognitive Development Study.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"35 3","pages":"1713-1736"},"PeriodicalIF":1.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12344502/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144849473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Paradigm for Generative Adversarial Networks based on Randomized Decision Rules.","authors":"Sehwan Kim, Qifan Song, Faming Liang","doi":"10.5705/ss.202022.0404","DOIUrl":"10.5705/ss.202022.0404","url":null,"abstract":"<p><p>The Generative Adversarial Network (GAN) was recently introduced in the literature as a novel machine learning method for training generative models. It has many applications in statistics such as nonparametric clustering and nonparametric conditional independence tests. However, training the GAN is notoriously difficult due to the issue of mode collapse, which refers to the lack of diversity among generated data. In this paper, we identify the reasons why the GAN suffers from this issue, and to address it, we propose a new formulation for the GAN based on randomized decision rules. In the new formulation, the discriminator converges to a fixed point while the generator converges to a distribution at the Nash equilibrium. We propose to train the GAN by an empirical Bayes-like method by treating the discriminator as a hyper-parameter of the posterior distribution of the generator. Specifically, we simulate generators from its posterior distribution conditioned on the discriminator using a stochastic gradient Markov chain Monte Carlo (MCMC) algorithm, and update the discriminator using stochastic gradient descent along with simulations of the generators. We establish convergence of the proposed method to the Nash equilibrium. Apart from image generation, we apply the proposed method to nonparametric clustering and nonparametric conditional independence tests. A portion of the numerical results is presented in the supplementary material.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"35 2","pages":"897-918"},"PeriodicalIF":1.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12017776/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144028320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-response Regression for Block-missing Multi-modal Data without Imputation.","authors":"Haodong Wang, Quefeng Li, Yufeng Liu","doi":"10.5705/ss.202021.0170","DOIUrl":"10.5705/ss.202021.0170","url":null,"abstract":"<p><p>Multi-modal data are prevalent in many scientific fields. In this study, we consider the parameter estimation and variable selection for a multi-response regression using block-missing multi-modal data. Our method allows the dimensions of both the responses and the predictors to be large, and the responses to be incomplete and correlated, a common practical problem in high-dimensional settings. Our proposed method uses two steps to make a prediction from a multi-response linear regression model with block-missing multi-modal predictors. In the first step, without imputing missing data, we use all available data to estimate the covariance matrix of the predictors and the cross-covariance matrix between the predictors and the responses. In the second step, we use these matrices and a penalized method to simultaneously estimate the precision matrix of the response vector, given the predictors, and the sparse regression parameter matrix. Lastly, we demonstrate the effectiveness of the proposed method using theoretical studies, simulated examples, and an analysis of a multi-modal imaging data set from the Alzheimer's Disease Neuroimaging Initiative.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":"527-546"},"PeriodicalIF":1.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11035992/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"INTEGRATING INCOMPLETE DATA FOR MEDIATION ANALYSIS.","authors":"Andriy Derkach, Joshua N Sampson, Ruth M Pfeiffer","doi":"10.5705/ss.202021.0373","DOIUrl":"10.5705/ss.202021.0373","url":null,"abstract":"<p><p>Mediation analysis examines the relationships between an exposure, a mediator, and an outcome. Although many approaches are available for performing such analyses they all require access to a single complete data set that contains the three key variables: outcome, exposure, and mediator. Here, we propose semiparametric methods for mediation analysis to estimate the standard causal parameters (direct and indirect effects) by combining information from several incomplete data sets, each containing only two of the three key variables. Importantly, our methods also handle scenarios in which only summary statistics based on those data sets are available. The resulting estimates of the causal parameters are asymptotically unbiased and normally distributed. We evaluate the performance of our methods in finite samples using simulations, and quantify the loss in efficiency from the lack of a complete data set with all three variables. We then apply proposed method to determine whether the number of terminal duct lobular units in the breast mediate the relationship between a polygenic risk score and breast cancer risk.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":"1045-1066"},"PeriodicalIF":1.2,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13048772/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact Analysis for Spatial Autoregressive Models: With Application to Air Pollution in China","authors":"Hsuan-Yu Chang, Jihai Yu","doi":"10.5705/ss.202021.0119","DOIUrl":"https://doi.org/10.5705/ss.202021.0119","url":null,"abstract":": In this paper, we investigate impact analysis and its asymptotic inference for spatial autoregressive models. LeSage and Pace (2009) introduce impact analysis for spatial models and use Monte Carlo simulations to compute the dispersion. We propose to use the delta method, which enables us to obtain the dispersion in an explicit form. In addition, we provide the element-wise impact analysis. We first study the cross-sectional case, where various impacts are introduced to measure the interaction and feedback effects in a space dimension. We then study the spatial dynamic panel case with simultaneous spatial and dynamic feedback involved in the impacts. Monte Carlo results show that the proposed impact analysis has satisfactory finite sample properties. Finally, we apply impact analysis to investigate how meteorological factors and air pollutants affect PM 2 . 5 in Chinese cities.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"36 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonlinear dimension reduction for functional data with application to clustering","authors":"Ruoxu Tan, Yiming Zang, G. Yin","doi":"10.5705/ss.202021.0393","DOIUrl":"https://doi.org/10.5705/ss.202021.0393","url":null,"abstract":"Nonlinear dimension reduction for functional data with application to clustering","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unbiased Boosting Estimation for Censored Survival Data","authors":"Li‐Pang Chen, G. Yi","doi":"10.5705/ss.202021.0050","DOIUrl":"https://doi.org/10.5705/ss.202021.0050","url":null,"abstract":": Boosting methods have been broadly discussed for various settings, and most methods handle data with complete observations. Although some methods are available for survival data with censored responses, they tend to assume a specific model for the survival process, and most provide numerical implementation procedures without rigorous theoretical justifications. In this paper, we develop an unbiased boosting estimation method for censored survival data, without assuming an explicit model, and explore three strategies for adjusting the loss functions, while accommodating censoring effects. We implement the proposed method using a functional gradient descent algorithm, and rigorously establish our theoretical results, including the consistency and optimization convergence. Our numerical studies show that the proposed method exhibits satisfactory performance in finite-sample settings.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70936904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}