Statistica Sinica最新文献

筛选
英文 中文
Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources 子采样和折刀:计算资源有限的大数据分析的一种实用方便的解决方案
IF 1.4 3区 数学
Statistica Sinica Pub Date : 2023-04-13 DOI: 10.5705/ss.202021.0257
Shuyuan Wu, Xuening Zhu, Hansheng Wang
{"title":"Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources","authors":"Shuyuan Wu, Xuening Zhu, Hansheng Wang","doi":"10.5705/ss.202021.0257","DOIUrl":"https://doi.org/10.5705/ss.202021.0257","url":null,"abstract":"Modern statistical analysis often encounters datasets with large sizes. For these datasets, conventional estimation methods can hardly be used immediately because practitioners often suffer from limited computational resources. In most cases, they do not have powerful computational resources (e.g., Hadoop or Spark). How to practically analyze large datasets with limited computational resources then becomes a problem of great importance. To solve this problem, we propose here a novel subsampling-based method with jackknifing. The key idea is to treat the whole sample data as if they were the population. Then, multiple subsamples with greatly reduced sizes are obtained by the method of simple random sampling with replacement. It is remarkable that we do not recommend sampling methods without replacement because this would incur a significant cost for data processing on the hard drive. Such cost does not exist if the data are processed in memory. Because subsampled data have relatively small sizes, they can be comfortably read into computer memory as a whole and then processed easily. Based on subsampled datasets, jackknife-debiased estimators can be obtained for the target parameter. The resulting estimators are statistically consistent, with an extremely small bias. Finally, the jackknife-debiased estimators from different subsamples are averaged together to form the final estimator. We theoretically show that the final estimator is consistent and asymptotically normal. Its asymptotic statistical efficiency can be as good as that of the whole sample estimator under very mild conditions. The proposed method is simple enough to be easily implemented on most practical computer systems and thus should have very wide applicability.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48682548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction 高维充分降维中的无切片逆回归
IF 1.4 3区 数学
Statistica Sinica Pub Date : 2023-04-13 DOI: 10.5705/ss.202022.0112
Qing Mai, X. Shao, Runmin Wang, Xin Zhang
{"title":"Slicing-free Inverse Regression in High-dimensional Sufficient Dimension Reduction","authors":"Qing Mai, X. Shao, Runmin Wang, Xin Zhang","doi":"10.5705/ss.202022.0112","DOIUrl":"https://doi.org/10.5705/ss.202022.0112","url":null,"abstract":"Sliced inverse regression (SIR, Li 1991) is a pioneering work and the most recognized method in sufficient dimension reduction. While promising progress has been made in theory and methods of high-dimensional SIR, two remaining challenges are still nagging high-dimensional multivariate applications. First, choosing the number of slices in SIR is a difficult problem, and it depends on the sample size, the distribution of variables, and other practical considerations. Second, the extension of SIR from univariate response to multivariate is not trivial. Targeting at the same dimension reduction subspace as SIR, we propose a new slicing-free method that provides a unified solution to sufficient dimension reduction with high-dimensional covariates and univariate or multivariate response. We achieve this by adopting the recently developed martingale difference divergence matrix (MDDM, Lee&Shao 2018) and penalized eigen-decomposition algorithms. To establish the consistency of our method with a high-dimensional predictor and a multivariate response, we develop a new concentration inequality for sample MDDM around its population counterpart using theories for U-statistics, which may be of independent interest. Simulations and real data analysis demonstrate the favorable finite sample performance of the proposed method.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41485273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distributed Logistic Regression for Massive Data with Rare Events 具有罕见事件的海量数据的分布式逻辑回归
IF 1.4 3区 数学
Statistica Sinica Pub Date : 2023-04-05 DOI: 10.5705/ss.202022.0242
Xia Li, Xuening Zhu, Hansheng Wang
{"title":"Distributed Logistic Regression for Massive Data with Rare Events","authors":"Xia Li, Xuening Zhu, Hansheng Wang","doi":"10.5705/ss.202022.0242","DOIUrl":"https://doi.org/10.5705/ss.202022.0242","url":null,"abstract":"Large-scale rare events data are commonly encountered in practice. To tackle the massive rare events data, we propose a novel distributed estimation method for logistic regression in a distributed system. For a distributed framework, we face the following two challenges. The first challenge is how to distribute the data. In this regard, two different distribution strategies (i.e., the RANDOM strategy and the COPY strategy) are investigated. The second challenge is how to select an appropriate type of objective function so that the best asymptotic efficiency can be achieved. Then, the under-sampled (US) and inverse probability weighted (IPW) types of objective functions are considered. Our results suggest that the COPY strategy together with the IPW objective function is the best solution for distributed logistic regression with rare events. The finite sample performance of the distributed methods is demonstrated by simulation studies and a real-world Sweden Traffic Sign dataset.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45849352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
PENALIZED REGRESSION FOR MULTIPLE TYPES OF MANY FEATURES WITH MISSING DATA. 对数据缺失的多种类型特征进行惩罚回归。
IF 1.4 3区 数学
Statistica Sinica Pub Date : 2023-04-01 DOI: 10.5705/ss.202020.0401
Kin Yau Wong, Donglin Zeng, D Y Lin
{"title":"PENALIZED REGRESSION FOR MULTIPLE TYPES OF MANY FEATURES WITH MISSING DATA.","authors":"Kin Yau Wong, Donglin Zeng, D Y Lin","doi":"10.5705/ss.202020.0401","DOIUrl":"10.5705/ss.202020.0401","url":null,"abstract":"<p><p>Recent technological advances have made it possible to measure multiple types of many features in biomedical studies. However, some data types or features may not be measured for all study subjects because of cost or other constraints. We use a latent variable model to characterize the relationships across and within data types and to infer missing values from observed data. We develop a penalized-likelihood approach for variable selection and parameter estimation and devise an efficient expectation-maximization algorithm to implement our approach. We establish the asymptotic properties of the proposed estimators when the number of features increases at a polynomial rate of the sample size. Finally, we demonstrate the usefulness of the proposed methods using extensive simulation studies and provide an application to a motivating multi-platform genomics study.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 2","pages":"633-662"},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10187615/pdf/nihms-1764514.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9482840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sieve estimation of a class of partially linear transformation models with interval-censored competing risks data. 利用区间删失竞争风险数据对一类部分线性变换模型进行筛式估计。
IF 1.4 3区 数学
Statistica Sinica Pub Date : 2023-04-01 DOI: 10.5705/ss.202021.0051
Xuewen Lu, Yan Wang, Dipankar Bandyopadhyay, Giorgos Bakoyannis
{"title":"Sieve estimation of a class of partially linear transformation models with interval-censored competing risks data.","authors":"Xuewen Lu, Yan Wang, Dipankar Bandyopadhyay, Giorgos Bakoyannis","doi":"10.5705/ss.202021.0051","DOIUrl":"10.5705/ss.202021.0051","url":null,"abstract":"<p><p>In this paper, we consider a class of partially linear transformation models with interval-censored competing risks data. Under a semiparametric generalized odds rate specification for the cause-specific cumulative incidence function, we obtain optimal estimators of the large number of parametric and nonparametric model components via maximizing the likelihood function over a joint B-spline and Bernstein polynomial spanned sieve space. Our specification considers a relatively simpler finite-dimensional parameter space, approximating the infinite-dimensional parameter space as <i>n</i> → ∞, thereby allowing us to study the almost sure consistency, and rate of convergence for all parameters, and the asymptotic distributions and efficiency of the finite-dimensional components. We study the finite sample performance of our method through simulation studies under a variety of scenarios. Furthermore, we illustrate our methodology via application to a dataset on HIV-infected individuals from sub-Saharan Africa.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 2","pages":"685-704"},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10208244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9526092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES. 整合多来源高维数据与癌症研究应用的异质性分析。
IF 1.4 3区 数学
Statistica Sinica Pub Date : 2023-04-01 DOI: 10.5705/ss.202021.0002
Tingyan Zhong, Qingzhao Zhang, Jian Huang, Mengyun Wu, Shuangge Ma
{"title":"HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES.","authors":"Tingyan Zhong, Qingzhao Zhang, Jian Huang, Mengyun Wu, Shuangge Ma","doi":"10.5705/ss.202021.0002","DOIUrl":"10.5705/ss.202021.0002","url":null,"abstract":"<p><p>This study has been motivated by cancer research, in which heterogeneity analysis plays an important role and can be roughly classified as unsupervised or supervised. In supervised heterogeneity analysis, the finite mixture of regression (FMR) technique is used extensively, under which the covariates affect the response differently in subgroups. High-dimensional molecular and, very recently, histopathological imaging features have been analyzed separately and shown to be effective for heterogeneity analysis. For simpler analysis, they have been shown to contain overlapping, but also independent information. In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant variables, and, equally importantly, promote the identification of independent information. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 2","pages":"729-758"},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10686523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138463958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Necessary and Sufficient Conditions for Multiple Objective Optimal Regression Designs 多目标最优回归设计的充分必要条件
IF 1.4 3区 数学
Statistica Sinica Pub Date : 2023-03-08 DOI: 10.5705/ss.202022.0328
Lucy L. Gao, J. Ye, Shangzhi Zeng, Julie Zhou
{"title":"Necessary and Sufficient Conditions for Multiple Objective Optimal Regression Designs","authors":"Lucy L. Gao, J. Ye, Shangzhi Zeng, Julie Zhou","doi":"10.5705/ss.202022.0328","DOIUrl":"https://doi.org/10.5705/ss.202022.0328","url":null,"abstract":"We typically construct optimal designs based on a single objective function. To better capture the breadth of an experiment's goals, we could instead construct a multiple objective optimal design based on multiple objective functions. While algorithms have been developed to find multi-objective optimal designs (e.g. efficiency-constrained and maximin optimal designs), it is far less clear how to verify the optimality of a solution obtained from an algorithm. In this paper, we provide theoretical results characterizing optimality for efficiency-constrained and maximin optimal designs on a discrete design space. We demonstrate how to use our results in conjunction with linear programming algorithms to verify optimality.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42471367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Sliced Inverse Regression via Cholesky Matrix Penalization 基于Cholesky矩阵惩罚的稀疏切片逆回归
3区 数学
Statistica Sinica Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0406
Linh Nghiem, Francis K.C. Hui, Samuel Mueller, A.H.Welsh
{"title":"Sparse Sliced Inverse Regression via Cholesky Matrix Penalization","authors":"Linh Nghiem, Francis K.C. Hui, Samuel Mueller, A.H.Welsh","doi":"10.5705/ss.202020.0406","DOIUrl":"https://doi.org/10.5705/ss.202020.0406","url":null,"abstract":"We introduce a new sparse sliced inverse regression estimator called Cholesky matrix penalization and its adaptive version for achieving sparsity in estimating the dimensions of the central subspace. The new estimators use the Cholesky decomposition of the covariance matrix of the covariates and include a regularization term in the objective function to achieve sparsity in a computationally efficient manner. We establish the theoretical values of the tuning parameters that achieve estimation and variable selection consistency for the central subspace. Furthermore, we propose a new projection information criterion to select the tuning parameter for our proposed estimators and prove that the new criterion facilitates selection consistency. The Cholesky matrix penalization estimator inherits the strength of the Matrix Lasso and the Lasso sliced inverse regression estimator; it has superior performance in numerical studies and can be adapted to other sufficient dimension methods in the literature.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
That Prasad-Rao is Robust: Estimation of Mean Squared Prediction Error of Observed Best Predictor under Potential Model Misspecification Prasad-Rao的鲁棒性:潜在模型错配下观测最佳预测器预测误差的均方估计
3区 数学
Statistica Sinica Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0325
Xiaohui Liu, Haiqiang Ma, Jiming Jiang
{"title":"That Prasad-Rao is Robust: Estimation of Mean Squared Prediction Error of Observed Best Predictor under Potential Model Misspecification","authors":"Xiaohui Liu, Haiqiang Ma, Jiming Jiang","doi":"10.5705/ss.202020.0325","DOIUrl":"https://doi.org/10.5705/ss.202020.0325","url":null,"abstract":"","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical Inference for Functional Time Series 函数时间序列的统计推断
3区 数学
Statistica Sinica Pub Date : 2023-01-01 DOI: 10.5705/ss.202021.0107
Jie Li, Lijian Yang
{"title":"Statistical Inference for Functional Time Series","authors":"Jie Li, Lijian Yang","doi":"10.5705/ss.202021.0107","DOIUrl":"https://doi.org/10.5705/ss.202021.0107","url":null,"abstract":"","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135182931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信