Statistica Sinica最新文献_第6页

Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources 子采样和折刀:计算资源有限的大数据分析的一种实用方便的解决方案

IF 1.4 3区数学

Statistica Sinica Pub Date : 2023-04-13 DOI: 10.5705/ss.202021.0257

Shuyuan Wu, Xuening Zhu, Hansheng Wang

{"title":"Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources","authors":"Shuyuan Wu, Xuening Zhu, Hansheng Wang","doi":"10.5705/ss.202021.0257","DOIUrl":"https://doi.org/10.5705/ss.202021.0257","url":null,"abstract":"Modern statistical analysis often encounters datasets with large sizes. For these datasets, conventional estimation methods can hardly be used immediately because practitioners often suffer from limited computational resources. In most cases, they do not have powerful computational resources (e.g., Hadoop or Spark). How to practically analyze large datasets with limited computational resources then becomes a problem of great importance. To solve this problem, we propose here a novel subsampling-based method with jackknifing. The key idea is to treat the whole sample data as if they were the population. Then, multiple subsamples with greatly reduced sizes are obtained by the method of simple random sampling with replacement. It is remarkable that we do not recommend sampling methods without replacement because this would incur a significant cost for data processing on the hard drive. Such cost does not exist if the data are processed in memory. Because subsampled data have relatively small sizes, they can be comfortably read into computer memory as a whole and then processed easily. Based on subsampled datasets, jackknife-debiased estimators can be obtained for the target parameter. The resulting estimators are statistically consistent, with an extremely small bias. Finally, the jackknife-debiased estimators from different subsamples are averaged together to form the final estimator. We theoretically show that the final estimator is consistent and asymptotically normal. Its asymptotic statistical efficiency can be as good as that of the whole sample estimator under very mild conditions. The proposed method is simple enough to be easily implemented on most practical computer systems and thus should have very wide applicability.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48682548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction 高维充分降维中的无切片逆回归

IF 1.4 3区数学

Statistica Sinica Pub Date : 2023-04-13 DOI: 10.5705/ss.202022.0112

Qing Mai, X. Shao, Runmin Wang, Xin Zhang

{"title":"Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction","authors":"Qing Mai, X. Shao, Runmin Wang, Xin Zhang","doi":"10.5705/ss.202022.0112","DOIUrl":"https://doi.org/10.5705/ss.202022.0112","url":null,"abstract":"Sliced inverse regression (SIR, Li 1991) is a pioneering work and the most recognized method in sufficient dimension reduction. While promising progress has been made in theory and methods of high-dimensional SIR, two remaining challenges are still nagging high-dimensional multivariate applications. First, choosing the number of slices in SIR is a difficult problem, and it depends on the sample size, the distribution of variables, and other practical considerations. Second, the extension of SIR from univariate response to multivariate is not trivial. Targeting at the same dimension reduction subspace as SIR, we propose a new slicing-free method that provides a unified solution to sufficient dimension reduction with high-dimensional covariates and univariate or multivariate response. We achieve this by adopting the recently developed martingale difference divergence matrix (MDDM, Lee&Shao 2018) and penalized eigen-decomposition algorithms. To establish the consistency of our method with a high-dimensional predictor and a multivariate response, we develop a new concentration inequality for sample MDDM around its population counterpart using theories for U-statistics, which may be of independent interest. Simulations and real data analysis demonstrate the favorable finite sample performance of the proposed method.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41485273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Distributed Logistic Regression for Massive Data with Rare Events 具有罕见事件的海量数据的分布式逻辑回归

IF 1.4 3区数学

Statistica Sinica Pub Date : 2023-04-05 DOI: 10.5705/ss.202022.0242

Xia Li, Xuening Zhu, Hansheng Wang

引用次数: 17

PENALIZED REGRESSION FOR MULTIPLE TYPES OF MANY FEATURES WITH MISSING DATA. 对数据缺失的多种类型特征进行惩罚回归。

IF 1.4 3区数学

Statistica Sinica Pub Date : 2023-04-01 DOI: 10.5705/ss.202020.0401

Kin Yau Wong, Donglin Zeng, D Y Lin

引用次数: 0

Sieve estimation of a class of partially linear transformation models with interval-censored competing risks data. 利用区间删失竞争风险数据对一类部分线性变换模型进行筛式估计。

IF 1.4 3区数学

Statistica Sinica Pub Date : 2023-04-01 DOI: 10.5705/ss.202021.0051

Xuewen Lu, Yan Wang, Dipankar Bandyopadhyay, Giorgos Bakoyannis

引用次数: 0

HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES. 整合多来源高维数据与癌症研究应用的异质性分析。

IF 1.4 3区数学

Statistica Sinica Pub Date : 2023-04-01 DOI: 10.5705/ss.202021.0002

Tingyan Zhong, Qingzhao Zhang, Jian Huang, Mengyun Wu, Shuangge Ma

{"title":"HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES.","authors":"Tingyan Zhong, Qingzhao Zhang, Jian Huang, Mengyun Wu, Shuangge Ma","doi":"10.5705/ss.202021.0002","DOIUrl":"10.5705/ss.202021.0002","url":null,"abstract":"<p><p>This study has been motivated by cancer research, in which heterogeneity analysis plays an important role and can be roughly classified as unsupervised or supervised. In supervised heterogeneity analysis, the finite mixture of regression (FMR) technique is used extensively, under which the covariates affect the response differently in subgroups. High-dimensional molecular and, very recently, histopathological imaging features have been analyzed separately and shown to be effective for heterogeneity analysis. For simpler analysis, they have been shown to contain overlapping, but also independent information. In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant variables, and, equally importantly, promote the identification of independent information. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 2","pages":"729-758"},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10686523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138463958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Necessary and Sufficient Conditions for Multiple Objective Optimal Regression Designs 多目标最优回归设计的充分必要条件

IF 1.4 3区数学

Statistica Sinica Pub Date : 2023-03-08 DOI: 10.5705/ss.202022.0328

Lucy L. Gao, J. Ye, Shangzhi Zeng, Julie Zhou

引用次数: 0

Sparse Sliced Inverse Regression via Cholesky Matrix Penalization 基于Cholesky矩阵惩罚的稀疏切片逆回归

3区数学

Statistica Sinica Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0406

Linh Nghiem, Francis K.C. Hui, Samuel Mueller, A.H.Welsh

引用次数: 1

That Prasad-Rao is Robust: Estimation of Mean Squared Prediction Error of Observed Best Predictor under Potential Model Misspecification Prasad-Rao的鲁棒性:潜在模型错配下观测最佳预测器预测误差的均方估计