{"title":"PENALIZED REGRESSION FOR MULTIPLE TYPES OF MANY FEATURES WITH MISSING DATA.","authors":"Kin Yau Wong, Donglin Zeng, D Y Lin","doi":"10.5705/ss.202020.0401","DOIUrl":"10.5705/ss.202020.0401","url":null,"abstract":"<p><p>Recent technological advances have made it possible to measure multiple types of many features in biomedical studies. However, some data types or features may not be measured for all study subjects because of cost or other constraints. We use a latent variable model to characterize the relationships across and within data types and to infer missing values from observed data. We develop a penalized-likelihood approach for variable selection and parameter estimation and devise an efficient expectation-maximization algorithm to implement our approach. We establish the asymptotic properties of the proposed estimators when the number of features increases at a polynomial rate of the sample size. Finally, we demonstrate the usefulness of the proposed methods using extensive simulation studies and provide an application to a motivating multi-platform genomics study.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 2","pages":"633-662"},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10187615/pdf/nihms-1764514.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9482840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuewen Lu, Yan Wang, Dipankar Bandyopadhyay, Giorgos Bakoyannis
{"title":"Sieve estimation of a class of partially linear transformation models with interval-censored competing risks data.","authors":"Xuewen Lu, Yan Wang, Dipankar Bandyopadhyay, Giorgos Bakoyannis","doi":"10.5705/ss.202021.0051","DOIUrl":"10.5705/ss.202021.0051","url":null,"abstract":"<p><p>In this paper, we consider a class of partially linear transformation models with interval-censored competing risks data. Under a semiparametric generalized odds rate specification for the cause-specific cumulative incidence function, we obtain optimal estimators of the large number of parametric and nonparametric model components via maximizing the likelihood function over a joint B-spline and Bernstein polynomial spanned sieve space. Our specification considers a relatively simpler finite-dimensional parameter space, approximating the infinite-dimensional parameter space as <i>n</i> → ∞, thereby allowing us to study the almost sure consistency, and rate of convergence for all parameters, and the asymptotic distributions and efficiency of the finite-dimensional components. We study the finite sample performance of our method through simulation studies under a variety of scenarios. Furthermore, we illustrate our methodology via application to a dataset on HIV-infected individuals from sub-Saharan Africa.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 2","pages":"685-704"},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10208244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9526092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingyan Zhong, Qingzhao Zhang, Jian Huang, Mengyun Wu, Shuangge Ma
{"title":"HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES.","authors":"Tingyan Zhong, Qingzhao Zhang, Jian Huang, Mengyun Wu, Shuangge Ma","doi":"10.5705/ss.202021.0002","DOIUrl":"10.5705/ss.202021.0002","url":null,"abstract":"<p><p>This study has been motivated by cancer research, in which heterogeneity analysis plays an important role and can be roughly classified as unsupervised or supervised. In supervised heterogeneity analysis, the finite mixture of regression (FMR) technique is used extensively, under which the covariates affect the response differently in subgroups. High-dimensional molecular and, very recently, histopathological imaging features have been analyzed separately and shown to be effective for heterogeneity analysis. For simpler analysis, they have been shown to contain overlapping, but also independent information. In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant variables, and, equally importantly, promote the identification of independent information. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 2","pages":"729-758"},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10686523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138463958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Necessary and Sufficient Conditions for Multiple Objective Optimal Regression Designs","authors":"Lucy L. Gao, J. Ye, Shangzhi Zeng, Julie Zhou","doi":"10.5705/ss.202022.0328","DOIUrl":"https://doi.org/10.5705/ss.202022.0328","url":null,"abstract":"We typically construct optimal designs based on a single objective function. To better capture the breadth of an experiment's goals, we could instead construct a multiple objective optimal design based on multiple objective functions. While algorithms have been developed to find multi-objective optimal designs (e.g. efficiency-constrained and maximin optimal designs), it is far less clear how to verify the optimality of a solution obtained from an algorithm. In this paper, we provide theoretical results characterizing optimality for efficiency-constrained and maximin optimal designs on a discrete design space. We demonstrate how to use our results in conjunction with linear programming algorithms to verify optimality.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42471367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linh Nghiem, Francis K.C. Hui, Samuel Mueller, A.H.Welsh
{"title":"Sparse Sliced Inverse Regression via Cholesky Matrix Penalization","authors":"Linh Nghiem, Francis K.C. Hui, Samuel Mueller, A.H.Welsh","doi":"10.5705/ss.202020.0406","DOIUrl":"https://doi.org/10.5705/ss.202020.0406","url":null,"abstract":"We introduce a new sparse sliced inverse regression estimator called Cholesky matrix penalization and its adaptive version for achieving sparsity in estimating the dimensions of the central subspace. The new estimators use the Cholesky decomposition of the covariance matrix of the covariates and include a regularization term in the objective function to achieve sparsity in a computationally efficient manner. We establish the theoretical values of the tuning parameters that achieve estimation and variable selection consistency for the central subspace. Furthermore, we propose a new projection information criterion to select the tuning parameter for our proposed estimators and prove that the new criterion facilitates selection consistency. The Cholesky matrix penalization estimator inherits the strength of the Matrix Lasso and the Lasso sliced inverse regression estimator; it has superior performance in numerical studies and can be adapted to other sufficient dimension methods in the literature.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"That Prasad-Rao is Robust: Estimation of Mean Squared Prediction Error of Observed Best Predictor under Potential Model Misspecification","authors":"Xiaohui Liu, Haiqiang Ma, Jiming Jiang","doi":"10.5705/ss.202020.0325","DOIUrl":"https://doi.org/10.5705/ss.202020.0325","url":null,"abstract":"","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Construction of Nonregular Two-Level Factorial Designs With Maximum Generalized Resolutions","authors":"Chenlu Shi, Boxin Tang","doi":"10.5705/ss.202021.0024","DOIUrl":"https://doi.org/10.5705/ss.202021.0024","url":null,"abstract":"","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135182934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonparametric Bayesian Two-Level Clustering for Subject-Level Single-Cell Expression Data","authors":"Qiuyu Wu, Xiangyu Luo","doi":"10.5705/ss.202020.0337","DOIUrl":"https://doi.org/10.5705/ss.202020.0337","url":null,"abstract":"The advent of single-cell sequencing opens new avenues for personalized treatment. In this paper, we address a two-level clustering problem of simultaneous subject subgroup discovery (subject level) and cell type detection (cell level) for single-cell expression data from multiple subjects. However, current statistical approaches either cluster cells without considering the subject heterogeneity or group subjects without using the single-cell information. To bridge the gap between cell clustering and subject grouping, we develop a nonparametric Bayesian model, Subject and Cell clustering for Single-Cell expression data (SCSC) model, to achieve subject and cell grouping simultaneously. SCSC does not need to prespecify the subject subgroup number or the cell type number. It automatically induces subject subgroup structures and matches cell types across subjects. Moreover, it directly models the single-cell raw count data by deliberately considering the data's dropouts, library sizes, and over-dispersion. A blocked Gibbs sampler is proposed for the posterior inference. Simulation studies and the application to a multi-subject iPSC scRNA-seq dataset validate the ability of SCSC to simultaneously cluster subjects and cells.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peiyao Wang, Quefeng Li, Dinggang Shen, Yufeng Liu
{"title":"HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.","authors":"Peiyao Wang, Quefeng Li, Dinggang Shen, Yufeng Liu","doi":"10.5705/ss.202020.0145","DOIUrl":"10.5705/ss.202020.0145","url":null,"abstract":"<p><p>In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 1","pages":"27-53"},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10583735/pdf/nihms-1892524.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49684205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}