{"title":"Adaptive Randomization via Mahalanobis Distance","authors":"Yichen Qin, Y. Li, Wei Ma, Haoyu Yang, F. Hu","doi":"10.5705/ss.202020.0440","DOIUrl":"https://doi.org/10.5705/ss.202020.0440","url":null,"abstract":": In comparative studies, researchers often seek an optimal covariate balance. However, chance imbalance still exists in randomized experiments, and becomes more serious as the number of covariates increases. To address this issue, we introduce a new randomization procedure, called adaptive randomization via the Mahalanobis distance (ARM). The proposed method allocates units sequentially and adaptively, using information on the current level of imbalance and the incoming unit’s covariate. Theoretical results and numerical comparison show that with a large number of covariates or a large number of units, the proposed method shows substantial advantages over traditional methods in terms of the covariate balance, estimation accuracy, hypothesis testing power, and computational time. The proposed method attains the optimal covariate balance, in the sense that the estimated treatment effect attains its minimum variance asymptotically, and can be applied in both causal inference and clinical trials. Lastly, numerical stud-1","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70936861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regression Analysis of Spatially Correlated Event Durations With Missing Origins Annotated by Longitudinal Measures","authors":"Y. Xiong, W. J. Braun, T. Duchesne, X. J. Hu","doi":"10.5705/ss.202021.0118","DOIUrl":"https://doi.org/10.5705/ss.202021.0118","url":null,"abstract":"This paper is concerned with event durations in situations where the study units may be spatially correlated and the time origins of the events are missing. We develop regression models based on the partly observed durations with the aid of available longitudinal information. The first-hitting-time model (e.g. Lee and Whitmore, 2006) is employed to link the data of event durations and the associated longitudinal measures with shared random effects. We present procedures for estimating the model parameters and an induced estimator of the conditional distribution of the event duration. We apply the EM algorithm and Monte Carlo methods to compute the proposed estimators. We establish consistency and asymptotic normality of the estimators, and present their variance estimation. The proposed approach is illustrated with a collection of wildfire records from Alberta, Canada. Its performance is examined numerically and compared with two competitors via simulation.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneous Functional Quantile Regression","authors":"Boyi Hu, Xixi Hu, Hua Liu, Jinhong You, Jiguo Cao","doi":"10.5705/ss.202021.0248","DOIUrl":"https://doi.org/10.5705/ss.202021.0248","url":null,"abstract":"The conventional method for functional quantile regression (FQR) is to fit the regression model for each quantile of interest separately. Therefore, the slope function of the regression, as a bivariate function of time and quantile, is estimated as a univariate function of time for each fixed quantile. However, there are several limitations to this conventional strategy. For example, it cannot guarantee the monotonicity of the conditional quantiles, nor can it control the smoothness of the slope estimator as a bivariate function. In this paper, we propose a new framework for FQR, in which we simultaneously fit the FQR model for multiple quantiles, with the help of a bivariate basis under some constraints, such that the estimated quantiles satisfy the monotonicity conditions and the smoothness of the slope estimator is controlled. The proposed estimator for the slope function is shown to be asymptotically consistent, and we establish its asymptotic normality. We use simulation to evaluate the finite-sample performance of the proposed method and compare it with that of the conventional method. We demonstrate the proposed method by analyzing the effects of Statistica Sinica: Preprint doi:10.5705/ss.202021.0248","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Odds Rate Frailty Models for Current Status Data with Informative Censoring","authors":"Yang Xu, Shishun Zhao, T. Hu, Jianguo Sun","doi":"10.5705/ss.202021.0411","DOIUrl":"https://doi.org/10.5705/ss.202021.0411","url":null,"abstract":": Current-status data occur in many areas, and the analysis of such data attracted much attention. In this study, we consider a regression analysis of current-status data in the presence of informative censoring, for which most existing methods either apply only to limited situations or are computationally unstable. Here, we propose a new sieve maximum likelihood estimation procedure under the class of semiparametric generalized odds rate frailty models. The proposed method uses the latent variable to describe the informative censoring or relationship between the failure time of interest and the censoring time. We develop a novel expectation-maximization algorithm for determining the proposed estimators, and establish their asymptotic consistency and normality. The results of a simulation study show that the proposed method performs well in practical","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Inference for Mean Function of Longitudinal Imaging Data over Complicated Domains","authors":"Qirui Hu, Jie Li","doi":"10.5705/ss.202021.0415","DOIUrl":"https://doi.org/10.5705/ss.202021.0415","url":null,"abstract":"We propose a novel procedure for estimating the mean function of longitudinal imaging data with inherent spatial and temporal correlation. We depict the dependence between temporally ordered images using a functional moving average, and use flexible bivariate splines over triangulations to handle the irregular domain of images which is common in imaging studies. We establish both the global and the local asymptotic properties of the bivariate spline estimator for the mean function, with simultaneous confidence corridors (SCCs) as a theoretical byproduct. Under some mild conditions, the proposed estimator and its accompanying SCCs are shown to be consistent and oracle efficient, as though all images were entirely observed without errors. We use Monte Carlo simulation experiments to demonstrate the finite-sample performance of the proposed method, the results of which strongly corroborate the asymptotic theory. The proposed method is further illustrated by analyzing two seawater potential temperature data sets.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Group Testing Regression Analysis with Missing Data and Imperfect Tests","authors":"A. Delaigle, Ruoxu Tan","doi":"10.5705/ss.202021.0382","DOIUrl":"https://doi.org/10.5705/ss.202021.0382","url":null,"abstract":": Estimating the prevalence of an infectious disease in a big population typically requires testing a specimen (e.g., blood, urine, or swab) for the disease. When the disease spreads quickly, time constraints and limited resources often restrict the number of tests that can be performed. In such cases, if the prevalence is not too high, the group testing procedure can be employed to save time, money, and resources. The procedure tests pooled specimens of groups of individuals, rather than testing each individual for the disease. This technique is also used in other contexts, for example, to detect abnormalities or contamination in animals, plants, food, or water. Although methods exist for estimating a prevalence conditional on the explanatory variables from the group testing data, they require the specimen to be available for all individuals, which is not always possible. Therefore, we construct new nonparametric estimators that are consistent when some of the specimens are missing. We demonstrate the numerical performance of our methods using simulations and a hepatitis B example.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Linear Errors-in-Variables Model with Unknown Heteroscedastic Measurement Errors","authors":"L. Nghiem, Cornelis J. Potgieter","doi":"10.5705/ss.202022.0331","DOIUrl":"https://doi.org/10.5705/ss.202022.0331","url":null,"abstract":"In the classic measurement error framework, covariates are contaminated by independent additive noise. This paper considers parameter estimation in such a linear errors-in-variables model where the unknown measurement error distribution is heteroscedastic across observations. We propose a new generalized method of moment (GMM) estimator that combines a moment correction approach and a phase function-based approach. The former requires distributions to have four finite moments, while the latter relies on covariates having asymmetric distributions. The new estimator is shown to be consistent and asymptotically normal under appropriate regularity conditions. The asymptotic covariance of the estimator is derived, and the estimated standard error is computed using a fast bootstrap procedure. The GMM estimator is demonstrated to have strong finite sample performance in numerical studies, especially when the measurement errors follow non-Gaussian distributions.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139315809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Use of random integration to test equality of high dimensional covariance matrices.","authors":"Yunlu Jiang, Canhong Wen, Yukang Jiang, Xueqin Wang, Heping Zhang","doi":"10.5705/ss.202020.0486","DOIUrl":"10.5705/ss.202020.0486","url":null,"abstract":"<p><p>Testing the equality of two covariance matrices is a fundamental problem in statistics, and especially challenging when the data are high-dimensional. Through a novel use of random integration, we can test the equality of high-dimensional covariance matrices without assuming parametric distributions for the two underlying populations, even if the dimension is much larger than the sample size. The asymptotic properties of our test for arbitrary number of covariates and sample size are studied in depth under a general multivariate model. The finite-sample performance of our test is evaluated through numerical studies. The empirical results demonstrate that our test is highly competitive with existing tests in a wide range of settings. In particular, our proposed test is distinctly powerful under different settings when there exist a few large or many small diagonal disturbances between the two covariance matrices.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10550010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41162333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yixin Han, Jun Yu, Nan Zhang, Cheng Meng, Ping Ma, Wenxuan Zhong, Changliang Zou
{"title":"Leverage Classifier: Another Look at Support Vector Machine","authors":"Yixin Han, Jun Yu, Nan Zhang, Cheng Meng, Ping Ma, Wenxuan Zhong, Changliang Zou","doi":"10.5705/ss.202023.0124","DOIUrl":"https://doi.org/10.5705/ss.202023.0124","url":null,"abstract":"Support vector machine (SVM) is a popular classifier known for accuracy, flexibility, and robustness. However, its intensive computation has hindered its application to large-scale datasets. In this paper, we propose a new optimal leverage classifier based on linear SVM under a nonseparable setting. Our classifier aims to select an informative subset of the training sample to reduce data size, enabling efficient computation while maintaining high accuracy. We take a novel view of SVM under the general subsampling framework and rigorously investigate the statistical properties. We propose a two-step subsampling procedure consisting of a pilot estimation of the optimal subsampling probabilities and a subsampling step to construct the classifier. We develop a new Bahadur representation of the SVM coefficients and derive unconditional asymptotic distribution and optimal subsampling probabilities without giving the full sample. Numerical results demonstrate that our classifiers outperform the existing methods in terms of estimation, computation, and prediction.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48579241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Unbiased Predictor for Skewed Response Variable with Measurement Error in Covariate","authors":"Sepideh Mosaferi, M. Ghosh, S. Sugasawa","doi":"10.5705/ss.202023.0098","DOIUrl":"https://doi.org/10.5705/ss.202023.0098","url":null,"abstract":"We introduce a new small area predictor when the Fay-Herriot normal error model is fitted to a logarithmically transformed response variable, and the covariate is measured with error. This framework has been previously studied by Mosaferi et al. (2023). The empirical predictor given in their manuscript cannot perform uniformly better than the direct estimator. Our proposed predictor in this manuscript is unbiased and can perform uniformly better than the one proposed in Mosaferi et al. (2023). We derive an approximation of the mean squared error (MSE) for the predictor. The prediction intervals based on the MSE suffer from coverage problems. Thus, we propose a non-parametric bootstrap prediction interval which is more accurate. This problem is of great interest in small area applications since statistical agencies and agricultural surveys are often asked to produce estimates of right skewed variables with covariates measured with errors. With Monte Carlo simulation studies and two Census Bureau's data sets, we demonstrate the superiority of our proposed methodology.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47397122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}