{"title":"New highly efficient high-breakdown estimator of multivariate scatter and location for elliptical distributions","authors":"Justin Fishbone, Lamine Mili","doi":"10.1002/cjs.11770","DOIUrl":"10.1002/cjs.11770","url":null,"abstract":"<p>High-breakdown-point estimators of multivariate location and shape matrices, such as the <span></span><math>\u0000 <mrow>\u0000 <mtext>MM</mtext>\u0000 </mrow></math>-<i>estimator</i> with smoothed hard rejection and the Rocke <span></span><math>\u0000 <mrow>\u0000 <mi>S</mi>\u0000 </mrow></math>-estimator, are generally designed to have high efficiency for Gaussian data. However, many phenomena are non-Gaussian, and these estimators can therefore have poor efficiency. This article proposes a new tunable <span></span><math>\u0000 <mrow>\u0000 <mi>S</mi>\u0000 </mrow></math>-estimator, termed the <span></span><math>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>S</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>q</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow></math>-estimator, for the general class of symmetric elliptical distributions, a class containing many common families such as the multivariate Gaussian, <span></span><math>\u0000 <mrow>\u0000 <mi>t</mi>\u0000 </mrow></math>-, Cauchy, Laplace, hyperbolic, and normal inverse Gaussian distributions. Across this class, the <span></span><math>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>S</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>q</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow></math>-estimator is shown to generally provide higher maximum efficiency than other leading high-breakdown estimators while maintaining the maximum breakdown point. Furthermore, the <span></span><math>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>S</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>q</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow></math>-estimator is demonstrated to be distributionally robust, and its robustness to outliers is demonstrated to be on par with these leading estimators while also being more stable with respect to initial conditions. From a practical viewpoint, these properties make the <span></span><math>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>S</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>q</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow></math>-estimator broadly applicable for practitioners. These advantages are demonstrated with an example application—the minimum-variance optimal allocation of financial portfolio investments.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 2","pages":"437-460"},"PeriodicalIF":0.6,"publicationDate":"2023-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11770","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44556703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah C. Lotspeich, Gustavo G. C. Amorim, Pamela A. Shaw, Ran Tao, Bryan E. Shepherd
{"title":"Optimal multiwave validation of secondary use data with outcome and exposure misclassification","authors":"Sarah C. Lotspeich, Gustavo G. C. Amorim, Pamela A. Shaw, Ran Tao, Bryan E. Shepherd","doi":"10.1002/cjs.11772","DOIUrl":"10.1002/cjs.11772","url":null,"abstract":"<p>Observational databases provide unprecedented opportunities for secondary use in biomedical research. However, these data can be error-prone and must be validated before use. It is usually unrealistic to validate the whole database because of resource constraints. A cost-effective alternative is a two-phase design that validates a subset of records enriched for information about a particular research question. We consider odds ratio estimation under differential outcome and exposure misclassification and propose optimal designs that minimize the variance of the maximum likelihood estimator. Our adaptive grid search algorithm can locate the optimal design in a computationally feasible manner. Because the optimal design relies on unknown parameters, we introduce a multiwave strategy to approximate the optimal design. We demonstrate the proposed design's efficiency gains through simulations and two large observational studies.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 2","pages":"532-554"},"PeriodicalIF":0.6,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11772","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46055319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A class of space-filling designs with low-dimensional stratification and column orthogonality","authors":"Pengnan Li, Fasheng Sun","doi":"10.1002/cjs.11761","DOIUrl":"10.1002/cjs.11761","url":null,"abstract":"<p>Strong orthogonal arrays are suitable designs for computer experiments because of stratification in low-dimensional projections. However, strong orthogonal arrays may be very expensive for a moderate number of factors. In this article, we develop a method for constructing space-filling designs with more economical run sizes. These designs are not only column-orthogonal but also enjoy a large proportion of low-dimensional stratification properties that strong orthogonal arrays ought to have. Moreover, a class of proposed designs can be 3-orthogonal. In addition, some theoretical results on regular fractional factorial designs are obtained as a by-product.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 1","pages":"310-326"},"PeriodicalIF":0.6,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44562952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
José A. Ordoñez, Marcos O. Prates, Jorge L. Bazán, Victor H. Lachos
{"title":"Penalized complexity priors for the skewness parameter of power links","authors":"José A. Ordoñez, Marcos O. Prates, Jorge L. Bazán, Victor H. Lachos","doi":"10.1002/cjs.11769","DOIUrl":"10.1002/cjs.11769","url":null,"abstract":"<p>The choice of a prior distribution is a key aspect of the Bayesian method. However, in many cases, such as the family of power links, this is not trivial. In this article, we introduce a penalized complexity prior (PC prior) of the skewness parameter for this family, which is useful for dealing with imbalanced data. We derive a general expression for this density and show its usefulness for some particular cases such as the power logit and the power probit links. A simulation study and a real data application are used to assess the efficiency of the introduced densities in comparison with the Gaussian and uniform priors. Results show improvement in point and credible interval estimation for the considered models when using the PC prior in comparison to other well-known standard priors.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 1","pages":"98-117"},"PeriodicalIF":0.6,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48663145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust nonparametric hypothesis tests for differences in the covariance structure of functional data","authors":"Kelly Ramsay, Shoja'eddin Chenouri","doi":"10.1002/cjs.11767","DOIUrl":"10.1002/cjs.11767","url":null,"abstract":"<p>We develop a group of robust, nonparametric hypothesis tests that detect differences between the covariance operators of several populations of functional data. These tests, called functional Kruskal–Wallis tests for covariance, or FKWC tests, are based on functional data depth ranks. FKWC tests work well even when the data are heavy-tailed, which is shown both in simulation and theory. FKWC tests offer several other benefits: they have a simple asymptotic distribution under the null hypothesis, they are computationally cheap, and they possess transformation-invariance properties. We show that under general alternative hypotheses, these tests are consistent under mild, nonparametric assumptions. As a result, we introduce a new functional depth function called <math>\u0000 <msup>\u0000 <mrow>\u0000 <mi>L</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>2</mn>\u0000 </mrow>\u0000 </msup></math>-root depth that works well for the purposes of detecting differences in magnitude between covariance kernels. We present an analysis of the FKWC test based on <math>\u0000 <msup>\u0000 <mrow>\u0000 <mi>L</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>2</mn>\u0000 </mrow>\u0000 </msup></math>-root depth under local alternatives. Through simulations, when the true covariance kernels have an infinite number of positive eigenvalues, we show that these tests have higher power than their competitors while maintaining their nominal size. We also provide a method for computing sample size and performing multiple comparisons.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 1","pages":"43-78"},"PeriodicalIF":0.6,"publicationDate":"2023-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47021433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Acknowledgement of Referees' Services Remerciements aux membres des jurys","authors":"","doi":"10.1002/cjs.11766","DOIUrl":"10.1002/cjs.11766","url":null,"abstract":"Aeberhard, William H. ETH Zürich Asgharian, Masoud McGill University Bahraoui, Tarik* Université du Québec à Montréal Battey, Heather Imperial College London Bédard, Mylène Université de Montréal Bellhouse, David* University of Western Ontario Berger, Yves* University of Southampton Braekers, Roel Hasselt University Brazzale, Alessandra University of Padova Cai, Song Carleton University Cao, Guanqun Auburn University Casa, Alessandro Free University of Bozen-Bolzano Chatterjee, Kashinath* Visva-Bharati University Chen, Baojiang University of Texas Health Science Center Chen, Guanhua University of Wisconsin-Madison Chen, Sixia University of Oklahoma Health Sciences Center Chen, Yaqing* University of California Davis Cheng, Yu University of Pittsburgh Cheung, Rex San Francisco State University Coia, Vincenzo University of British Columbia Cook, Richard University of Waterloo Csató, László ELKH SZTAKI Dagne, Getachew University of South Florida Dai, Ben Chinese University of Hong Kong","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"51 1","pages":"344-349"},"PeriodicalIF":0.6,"publicationDate":"2023-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42178036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PCA Rerandomization","authors":"Hengtao Zhang, Guosheng Yin, Donald B. Rubin","doi":"10.1002/cjs.11765","DOIUrl":"10.1002/cjs.11765","url":null,"abstract":"<p>Mahalanobis distance of covariate means between treatment and control groups is often adopted as a balance criterion when implementing a rerandomization strategy. However, this criterion may not work well for high-dimensional cases because it balances all orthogonalized covariates equally. We propose using principal component analysis (PCA) to identify proper subspaces in which Mahalanobis distance should be calculated. Not only can PCA effectively reduce the dimensionality for high-dimensional covariates, but it also provides computational simplicity by focusing on the top orthogonal components. The PCA rerandomization scheme has desirable theoretical properties for balancing covariates and thereby improving the estimation of average treatment effects. This conclusion is supported by numerical studies using both simulated and real examples.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 1","pages":"5-25"},"PeriodicalIF":0.6,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11765","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44206441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Method of model checking for case II interval-censored data under the additive hazards model","authors":"Yanqin Feng, Ming Tang, Jieli Ding","doi":"10.1002/cjs.11759","DOIUrl":"10.1002/cjs.11759","url":null,"abstract":"<p>General or case II interval-censored data are commonly encountered in practice. We develop methods for model-checking and goodness-of-fit testing for the additive hazards model with case II interval-censored data. We propose test statistics based on the supremum of the stochastic processes derived from the cumulative sum of martingale-based residuals over time and covariates. We approximate the distribution of the stochastic process via a simulation technique to conduct a class of graphical and numerical techniques for various purposes of model-fitting evaluations. Simulation studies are conducted to assess the finite-sample performance of the proposed method. A real dataset from an AIDS observational study is analyzed for illustration.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 1","pages":"212-236"},"PeriodicalIF":0.6,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48612832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subgroup analysis of linear models with measurement error","authors":"Yuan Le, Yang Bai, Guoyou Qin","doi":"10.1002/cjs.11763","DOIUrl":"10.1002/cjs.11763","url":null,"abstract":"<p>Heterogeneity exists in populations, and people may benefit differently from the same treatments or services. Correctly identifying subgroups corresponding to outcomes such as treatment response plays an important role in data-based decision making. As few discussions exist on subgroup analysis with measurement error, we propose a new estimation method to consider these two components simultaneously under the linear regression model. First, we develop an objective function based on unbiased estimating equations with two repeated measurements and a concave penalty on pairwise differences between coefficients. The proposed method can identify subgroups and estimate coefficients simultaneously when considering measurement error. Second, we derive an algorithm based on the alternating direction method of multipliers algorithm and demonstrate its convergence. Third, we prove that the proposed estimators are consistent and asymptotically normal. The performance and asymptotic properties of the proposed method are evaluated through simulation studies. Finally, we apply our method to data from the Lifestyle Education for Activity and Nutrition study and identify two subgroups, of which one has a significant treatment effect.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 1","pages":"26-42"},"PeriodicalIF":0.6,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47687270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuojian Chen, Zhanfeng Wang, Yuan-chin Ivan Chang
{"title":"Distributed sequential estimation procedures","authors":"Zhuojian Chen, Zhanfeng Wang, Yuan-chin Ivan Chang","doi":"10.1002/cjs.11762","DOIUrl":"10.1002/cjs.11762","url":null,"abstract":"<p>Data collected from distributed sources or sites commonly have different distributions or contaminated observations. Active learning procedures allow us to assess data when recruiting new data into model building. Thus, combining several active learning procedures together is a promising idea, even when the collected data set is contaminated. Here, we study how to conduct and integrate several adaptive sequential procedures at a time to produce a valid result via several machines or a parallel-computing framework. To avoid distraction by complicated modelling processes, we use confidence set estimation for linear models to illustrate the proposed method and discuss the approach's statistical properties. We then evaluate its performance using both synthetic and real data. We have implemented our method using Python and made it available through Github at https://github.com/zhuojianc/dsep.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 1","pages":"271-290"},"PeriodicalIF":0.6,"publicationDate":"2023-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43825926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}