{"title":"Inverse‐probability‐weighted logrank test for stratified survival data with missing measurements","authors":"Rim Ben Elouefi, Foued Saâdaoui","doi":"10.1111/stan.12276","DOIUrl":"https://doi.org/10.1111/stan.12276","url":null,"abstract":"The stratified logrank test can be used to compare survival distributions of several groups of patients, while adjusting for the effect of some discrete variable that may be predictive of the survival outcome. In practice, it can happen that this discrete variable is missing for some patients. An inverse‐probability‐weighted version of the stratified logrank statistic is introduced to tackle this issue. Its asymptotic distribution is derived under the null hypothesis of equality of the survival distributions. A simulation study is conducted to assess behavior of the proposed test statistic in finite samples. An analysis of a medical dataset illustrates the methodology.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"29 1","pages":"113 - 129"},"PeriodicalIF":1.5,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82520985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessing replicability with the sceptical p$$ p $$ ‐value: Type‐I error control and sample size planning","authors":"Charlotte Micheloud, F. Balabdaoui, L. Held","doi":"10.1111/stan.12312","DOIUrl":"https://doi.org/10.1111/stan.12312","url":null,"abstract":"We study a statistical framework for replicability based on a recently proposed quantitative measure of replication success, the sceptical p$$ p $$ ‐value. A recalibration is proposed to obtain exact overall Type‐I error control if the effect is null in both studies and additional bounds on the partial and conditional Type‐I error rate, which represent the case where only one study has a null effect. The approach avoids the double dichotomization for significance of the two‐trials rule and has larger project power to detect existing effects over both studies in combination. It can also be used for power calculations and requires a smaller replication sample size than the two‐trials rule for already convincing original studies. We illustrate the performance of the proposed methodology in an application to data from the Experimental Economics Replication Project.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"77 1","pages":"573 - 591"},"PeriodicalIF":1.5,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83870470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic bias correction for testing in high‐dimensional linear models","authors":"Jing Zhou, G. Claeskens","doi":"10.1111/stan.12274","DOIUrl":"https://doi.org/10.1111/stan.12274","url":null,"abstract":"Hypothesis testing is challenging due to the test statistic's complicated asymptotic distribution when it is based on a regularized estimator in high dimensions. We propose a robust testing framework for ℓ1$$ {ell}_1 $$ ‐regularized M‐estimators to cope with non‐Gaussian distributed regression errors, using the robust approximate message passing algorithm. The proposed framework enjoys an automatically built‐in bias correction and is applicable with general convex nondifferentiable loss functions which also allows inference when the focus is a conditional quantile instead of the mean of the response. The estimator compares numerically well with the debiased and desparsified approaches while using the least squares loss function. The use of the Huber loss function demonstrates that the proposed construction provides stable confidence intervals under different regression error distributions.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"57 1","pages":"71 - 98"},"PeriodicalIF":1.5,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86790588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessing skewness in financial markets","authors":"Giovanni Campisi, L. La Rocca, S. Muzzioli","doi":"10.1111/stan.12273","DOIUrl":"https://doi.org/10.1111/stan.12273","url":null,"abstract":"It is a matter of common observation that investors value substantial gains but are averse to heavy losses. Obvious as it may sound, this translates into an interesting preference for right‐skewed return distributions, whose right tails are heavier than their left tails. Skewness is thus not only a way to describe the shape of a distribution, but also a tool for risk measurement. We review the statistical literature on skewness and provide a comprehensive framework for its assessment. Then, we present a new measure of skewness, based on the decomposition of variance in its upward and downward components. We argue that this measure fills a gap in the literature and show in a simulation study that it strikes a good balance between robustness and sensitivity.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"61 1","pages":"48 - 70"},"PeriodicalIF":1.5,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88552120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Autoregressive and moving average models for zero‐inflated count time series","authors":"Vurukonda Sathish, S. Mukhopadhyay, R. Tiwari","doi":"10.1111/stan.12255","DOIUrl":"https://doi.org/10.1111/stan.12255","url":null,"abstract":"Zero inflation is a common nuisance while monitoring disease progression over time. This article proposes a new observation‐driven model for zero‐inflated and over‐dispersed count time series. The counts given from the past history of the process and available information on covariates are assumed to be distributed as a mixture of a Poisson distribution and a distribution degenerated at zero, with a time‐dependent mixing probability, πt . Since, count data usually suffers from overdispersion, a Gamma distribution is used to model the excess variation, resulting in a zero‐inflated negative binomial regression model with mean parameter λt . Linear predictors with autoregressive and moving average (ARMA) type terms, covariates, seasonality and trend are fitted to λt and πt through canonical link generalized linear models. Estimation is done using maximum likelihood aided by iterative algorithms, such as Newton‐Raphson (NR) and Expectation and Maximization. Theoretical results on the consistency and asymptotic normality of the estimators are given. The proposed model is illustrated using in‐depth simulation studies and two disease datasets.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"36 1 1","pages":"190 - 218"},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79903003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Threshold estimation for continuous three‐phase polynomial regression models with constant mean in the middle regime","authors":"Chih‐Hao Chang, Kam-Fai Wong, Wei‐Yee Lim","doi":"10.1111/stan.12268","DOIUrl":"https://doi.org/10.1111/stan.12268","url":null,"abstract":"This paper considers a continuous three‐phase polynomial regression model with two threshold points for dependent data with heteroscedasticity. We assume the model is polynomial of order zero in the middle regime, and is polynomial of higher orders elsewhere. We denote this model by ℳ2$$ {mathcal{M}}_2 $$ , which includes models with one or no threshold points, denoted by ℳ1$$ {mathcal{M}}_1 $$ and ℳ0$$ {mathcal{M}}_0 $$ , respectively, as special cases. We provide an ordered iterative least squares (OiLS) method when estimating ℳ2$$ {mathcal{M}}_2 $$ and establish the consistency of the OiLS estimators under mild conditions. When the underlying model is ℳ1$$ {mathcal{M}}_1 $$ and is (d0−1)$$ left({d}_0-1right) $$ th‐order differentiable but not d0$$ {d}_0 $$ th‐order differentiable at the threshold point, we further show the Op(N−1/(d0+2))$$ {O}_pleft({N}^{-1/left({d}_0+2right)}right) $$ convergence rate of the OiLS estimators, which can be faster than the Op(N−1/(2d0))$$ {O}_pleft({N}^{-1/left(2{d}_0right)}right) $$ convergence rate given in Feder when d0≥3$$ {d}_0ge 3 $$ . We also apply a model‐selection procedure for selecting ℳκ$$ {mathcal{M}}_{kappa } $$ ; κ=0,1,2$$ kappa =0,1,2 $$ . When the underlying model exists, we establish the selection consistency under the aforementioned conditions. Finally, we conduct simulation experiments to demonstrate the finite‐sample performance of our asymptotic results.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"13 1","pages":"4 - 47"},"PeriodicalIF":1.5,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87649922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal subsampling for multiplicative regression with massive data","authors":"Tianzhen Wang, Haixiang Zhang","doi":"10.1111/stan.12266","DOIUrl":"https://doi.org/10.1111/stan.12266","url":null,"abstract":"Faced with massive data, subsampling is a popular way to downsize the data volume for reducing computational burden. The key idea of subsampling is to perform statistical analysis on a representative subsample drawn from the full data. It provides a practical solution to extracting useful information from big data. In this article, we develop an efficient subsampling method for large‐scale multiplicative regression model, which can largely reduce the computational burden due to massive data. Under some regularity conditions, we establish consistency and asymptotic normality of the subsample‐based estimator, and derive the optimal subsampling probabilities according to the L‐optimality criterion. A two‐step algorithm is developed to approximate the optimal subsampling procedure. Meanwhile, the convergence rate and asymptotic normality of the two‐step subsample estimator are established. Numerical studies and two real data applications are carried out to evaluate the performance of our subsampling method.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"11 1","pages":"418 - 449"},"PeriodicalIF":1.5,"publicationDate":"2022-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89804072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Change-point analysis through integer-valued autoregressive process with application to some COVID-19 data.","authors":"Subhankar Chattopadhyay, Raju Maiti, Samarjit Das, Atanu Biswas","doi":"10.1111/stan.12251","DOIUrl":"https://doi.org/10.1111/stan.12251","url":null,"abstract":"<p><p>In this article, we consider the problem of change-point analysis for the count time series data through an integer-valued autoregressive process of order 1 (INAR(1)) with time-varying covariates. These types of features we observe in many real-life scenarios especially in the COVID-19 data sets, where the number of active cases over time starts falling and then again increases. In order to capture those features, we use Poisson INAR(1) process with a time-varying smoothing covariate. By using such model, we can model both the components in the active cases at time-point <i>t</i> namely, (i) number of nonrecovery cases from the previous time-point and (ii) number of new cases at time-point <i>t</i>. We study some theoretical properties of the proposed model along with forecasting. Some simulation studies are performed to study the effectiveness of the proposed method. Finally, we analyze two COVID-19 data sets and compare our proposed model with another PINAR(1) process which has time-varying covariate but no change-point, to demonstrate the overall performance of our proposed model.</p>","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"76 1","pages":"4-34"},"PeriodicalIF":1.5,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/stan.12251","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39154751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Wang, Linjiang Li, Sheng Li, F. Yin, Fang Liao, Zhang Tao, Xiaosong Li, Xiong Xiao, Yue Ma
{"title":"Average ordinary least squares‐centered penalized regression: A more efficient way to address multicollinearity than ridge regression","authors":"Wei Wang, Linjiang Li, Sheng Li, F. Yin, Fang Liao, Zhang Tao, Xiaosong Li, Xiong Xiao, Yue Ma","doi":"10.1111/stan.12263","DOIUrl":"https://doi.org/10.1111/stan.12263","url":null,"abstract":"We developed a novel method to address multicollinearity in linear models called average ordinary least squares (OLS)‐centered penalized regression (AOPR). AOPR penalizes the cost function to shrink the estimators toward the weighted‐average OLS estimator. The commonly used ridge regression (RR) shrinks the estimators toward zero, that is, employs penalization prior β∼N(0,1/k) in the Bayesian view, which contradicts the common real prior β≠0 . Therefore, RR selects small penalization coefficients to relieve such a contradiction and thus makes the penalizations inadequate. Mathematical derivations remind us that AOPR could increase the performance of RR and OLS regression. A simulation study shows that AOPR obtains more accurate estimators than OLS regression in most situations and more accurate estimators than RR when the signs of the true β s are identical and is slightly less accurate than RR when the signs of the true β s are different. Additionally, a case study shows that AOPR obtains more stable estimators and stronger statistical power and predictive ability than RR and OLS regression. Through these results, we recommend using AOPR to address multicollinearity more efficiently than RR and OLS regression, especially when the true β s have identical signs.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"88 1","pages":"347 - 368"},"PeriodicalIF":1.5,"publicationDate":"2022-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81131506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction intervals for all of M future observations based on linear random effects models","authors":"M. Menssen, F. Schaarschmidt","doi":"10.1111/stan.12260","DOIUrl":"https://doi.org/10.1111/stan.12260","url":null,"abstract":"In many pharmaceutical and biomedical applications such as assay validation, assessment of historical control data, or the detection of anti‐drug antibodies, the calculation and interpretation of prediction intervals (PI) is of interest. The present study provides two novel methods for the calculation of prediction intervals based on linear random effects models and restricted maximum likelihood (REML) estimation. Unlike other REML‐based PI found in the literature, both intervals reflect the uncertainty related with the estimation of the prediction variance. The first PI is based on Satterthwaite approximation. For the other PI, a bootstrap calibration approach that we will call quantile‐calibration was used. Due to the calibration process this PI can be easily computed for more than one future observation and based on balanced and unbalanced data as well. In order to compare the coverage probabilities of the proposed PI with those of four intervals found in the literature, Monte Carlo simulations were run for two relatively complex random effects models and a broad range of parameter settings. The quantile‐calibrated PI was implemented in the statistical software R and is available in the predint package.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"66 1","pages":"283 - 308"},"PeriodicalIF":1.5,"publicationDate":"2021-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77968796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}