{"title":"The impact of misclassification on covariate-adaptive randomized clinical trials with generalized linear models","authors":"Tong Wang, Wei Ma","doi":"10.1016/j.jspi.2024.106209","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106209","url":null,"abstract":"<div><p>Covariate-adaptive randomization (CAR) is a type of randomization method that uses covariate information to enhance the comparability between different treatment groups. Under such randomization, the covariate is usually well balanced, i.e., the imbalance between the treatment group and placebo group is controlled. In practice, the covariate is sometimes misclassified. The covariate misclassification affects the CAR itself and statistical inferences after the CAR. In this paper, we examine the impact of covariate misclassification on CAR from two aspects. First, we study the balancing properties of CAR with unequal allocation in the presence of covariate misclassification. We show the convergence rate of the imbalance and compare it with that under true covariate. Second, we study the hypothesis test under CAR with misclassified covariates in a generalized linear model (GLM) framework. We consider both the unadjusted and adjusted models. To illustrate the theoretical results, we discuss the validity of test procedures for three commonly-used GLM, i.e., logistic regression, Poisson regression and exponential model. Specifically, we show that the adjusted model is often invalid when the misclassified covariates are adjusted. In this case, we provide a simple correction for the inflated Type-I error. The correction is useful and easy to implement because it does not require misclassification specification and estimation of the misclassification rate. Our study enriches the literature on the impact of covariate misclassification on CAR and provides a practical approach for handling misclassification.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106209"},"PeriodicalIF":0.8,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141593759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A zero-estimator approach for estimating the signal level in a high-dimensional model-free setting","authors":"Ilan Livne, David Azriel, Yair Goldberg","doi":"10.1016/j.jspi.2024.106207","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106207","url":null,"abstract":"<div><p>We study a high-dimensional regression setting under the assumption of known covariate distribution. We aim at estimating the amount of explained variation in the response by the best linear function of the covariates (the signal level). In our setting, neither sparsity of the coefficient vector, nor normality of the covariates or linearity of the conditional expectation are assumed. We present an unbiased and consistent estimator and then improve it by using a zero-estimator approach, where a zero-estimator is a statistic whose expected value is zero. More generally, we present an algorithm based on the zero estimator approach that in principle can improve any given estimator. We study some asymptotic properties of the proposed estimators and demonstrate their finite sample performance in a simulation study.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106207"},"PeriodicalIF":0.8,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141482213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Layer sparsity in neural networks","authors":"Mohamed Hebiri , Johannes Lederer , Mahsa Taheri","doi":"10.1016/j.jspi.2024.106195","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106195","url":null,"abstract":"<div><p>Sparsity has become popular in machine learning because it can save computational resources, facilitate interpretations, and prevent overfitting. This paper discusses sparsity in the framework of neural networks. In particular, we formulate a new notion of sparsity, called layer sparsity, that concerns the networks’ layers and, therefore, aligns particularly well with the current trend toward deep networks. We then introduce corresponding regularization and refitting schemes that can complement standard deep-learning pipelines to generate more compact and accurate networks.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106195"},"PeriodicalIF":0.9,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378375824000521/pdfft?md5=b1aa1392925da05f5ac50fc5d4831546&pid=1-s2.0-S0378375824000521-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141323230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High dimensional discriminant rules with shrinkage estimators of the covariance matrix and mean vector","authors":"Jaehoan Kim , Junyong Park , Hoyoung Park","doi":"10.1016/j.jspi.2024.106199","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106199","url":null,"abstract":"<div><p>Linear discriminant analysis (LDA) is a typical method for classification problems with large dimensions and small samples. There are various types of LDA methods that are based on the different types of estimators for the covariance matrices and mean vectors. In this paper, we consider shrinkage methods based on a non-parametric approach. For the precision matrix, methods based on the sparsity structure or data splitting are examined. Regarding the estimation of mean vectors, Non-parametric Empirical Bayes (NPEB) methods and Non-parametric Maximum Likelihood Estimation (NPMLE) methods, also known as <span><math><mi>f</mi></math></span>-modeling and <span><math><mi>g</mi></math></span>-modeling, respectively, are adopted. The performance of linear discriminant rules based on combined estimation strategies of the covariance matrix and mean vectors are analyzed in this study. Particularly, the study presents a theoretical result on the performance of the NPEB method and compares it with previous studies. Simulation studies with various covariance matrices and mean vector structures are conducted to evaluate the methods discussed in this paper. Furthermore, real data examples such as gene expressions and EEG data are also presented.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106199"},"PeriodicalIF":0.9,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141422982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mixed-integer linear programming for computing optimal experimental designs","authors":"Radoslav Harman, Samuel Rosa","doi":"10.1016/j.jspi.2024.106200","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106200","url":null,"abstract":"<div><p>The problem of computing an exact experimental design that is optimal for the least-squares estimation of the parameters of a regression model is considered. We show that this problem can be solved via mixed-integer linear programming (MILP) for a wide class of optimality criteria, including the criteria of A-, I-, G- and MV-optimality. This approach improves upon the current state-of-the-art mathematical programming formulation, which uses mixed-integer second-order cone programming. The key idea underlying the MILP formulation is McCormick relaxation, which critically depends on finite interval bounds for the elements of the covariance matrix of the least-squares estimator corresponding to an optimal exact design. We provide both analytic and algorithmic methods for constructing these bounds. We also demonstrate the unique advantages of the MILP approach, such as the possibility of incorporating multiple design constraints into the optimization problem, including constraints on the variances and covariances of the least-squares estimator.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106200"},"PeriodicalIF":0.9,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141323229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some results for stochastic orders and aging properties related to the Laplace transform","authors":"Lazaros Kanellopoulos, Konstadinos Politis","doi":"10.1016/j.jspi.2024.106197","DOIUrl":"10.1016/j.jspi.2024.106197","url":null,"abstract":"<div><p>We study some properties and relations for stochastic orders and aging classes related to the Laplace transform. In particular, we show that the NBU<span><math><msub><mrow></mrow><mrow><mtext>Lt</mtext></mrow></msub></math></span> class of distributions is closed under convolution. We also obtain results for the ratio of derivatives of the Laplace transform between two distributions.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106197"},"PeriodicalIF":0.9,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141403038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical theory for image classification using deep convolutional neural network with cross-entropy loss under the hierarchical max-pooling model","authors":"Michael Kohler , Sophie Langer","doi":"10.1016/j.jspi.2024.106188","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106188","url":null,"abstract":"<div><p>Convolutional neural networks (CNNs) trained with cross-entropy loss have proven to be extremely successful in classifying images. In recent years, much work has been done to also improve the theoretical understanding of neural networks. Nevertheless, it seems limited when these networks are trained with cross-entropy loss, mainly because of the unboundedness of the target function. In this paper, we aim to fill this gap by analysing the rate of the excess risk of a CNN classifier trained by cross-entropy loss. Under suitable assumptions on the smoothness and structure of the a posteriori probability, it is shown that these classifiers achieve a rate of convergence which is independent of the dimension of the image. These rates are in line with the practical observations about CNNs.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106188"},"PeriodicalIF":0.9,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378375824000454/pdfft?md5=68a8b5f0ef9e0563ac8f09f8ca152533&pid=1-s2.0-S0378375824000454-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141422984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Construction on large four-level designs via quaternary codes","authors":"Xiangyu Fang , Hongyi Li , Zujun Ou","doi":"10.1016/j.jspi.2024.106198","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106198","url":null,"abstract":"<div><p>In this paper, two simple and effective construction methods are proposed to construct four-level design with large size via quaternary codes from some small two-level initial designs. Under the popular criteria for selecting optimal design, such as generalized minimum aberration, minimum moment aberration and uniformity measured by average Lee discrepancy, the close relationships between the constructed four-level design and its initial design are investigated, which provide the guidance for choosing the suitable initial design. Moreover, some lower bounds of average Lee discrepancy for the constructed four-level designs are obtained, which can be used as a benchmark for evaluating the uniformity of the constructed four-level designs. Some numerical examples show that the large four-level designs can be constructed with high efficiency.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106198"},"PeriodicalIF":0.9,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141302737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Integrative Analysis via Quantile Regression with Homogeneity and Sparsity","authors":"Hao Zeng , Chuang Wan , Wei Zhong , Tuo Liu","doi":"10.1016/j.jspi.2024.106196","DOIUrl":"10.1016/j.jspi.2024.106196","url":null,"abstract":"<div><p>Integrative analysis plays a critical role in integrating heterogeneous data from multiple datasets to provide a comprehensive view of the overall data features. However, in multiple datasets, outliers and heavy-tailed data can render least squares estimation unreliable. In response, we propose a Robust Integrative Analysis via Quantile Regression (RIAQ) that accounts for homogeneity and sparsity in multiple datasets. The RIAQ approach is not only able to identify latent homogeneous coefficient structures but also recover the sparsity of high-dimensional covariates via double penalty terms. The integration of sample information across multiple datasets improves estimation efficiency, while a sparse model improves model interpretability. Furthermore, quantile regression allows the detection of subgroup structures under different quantile levels, providing a comprehensive picture of the relationship between response and high-dimensional covariates. We develop an efficient alternating direction method of multipliers (ADMM) algorithm to solve the optimization problem and study its convergence. We also derive the parameter selection consistency of the modified Bayesian information criterion. Numerical studies demonstrate that our proposed estimator has satisfactory finite-sample performance, especially in heavy-tailed cases.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106196"},"PeriodicalIF":0.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141282198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Testing truncation dependence: The Gumbel–Barnett copula","authors":"Anne-Marie Toparkus, Rafael Weißbach","doi":"10.1016/j.jspi.2024.106194","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106194","url":null,"abstract":"<div><p>In studies on lifetimes, occasionally, the population contains statistical units that are born before the data collection has started. Left-truncated are units that deceased before this start. For all other units, the age at the study start often is recorded and we aim at testing whether this second measurement is independent of the genuine measure of interest, the lifetime. Our basic model of dependence is the one-parameter Gumbel–Barnett copula. For simplicity, the marginal distribution of the lifetime is assumed to be Exponential and for the age-at-study-start, namely the distribution of birth dates, we assume a Uniform. Also for simplicity, and to fit our application, we assume that units that die later than our study period, are also truncated. As a result from point process theory, we can approximate the truncated sample by a Poisson process and thereby derive its likelihood. Identification, consistency and asymptotic distribution of the maximum-likelihood estimator are derived. Testing for positive truncation dependence must include the hypothetical independence which coincides with the boundary of the copula’s parameter space. By non-standard theory, the maximum likelihood estimator of the exponential and the copula parameter is distributed as a mixture of a two- and a one-dimensional normal distribution. For the proof, the third parameter, the unobservable sample size, is profiled out. An interesting result is, that it differs to view the data as truncated sample, or, as simple sample from the truncated population, but not by much. The application are 55 thousand double-truncated lifetimes of German businesses that closed down over the period 2014 to 2016. The likelihood has its maximum for the copula parameter at the parameter space boundary so that the <span><math><mi>p</mi></math></span>-value of test is 0.5. The life expectancy does not increase relative to the year of foundation. Using a Farlie–Gumbel–Morgenstern copula, which models positive and negative dependence, finds that life expectancy of German enterprises even decreases significantly over time. A simulation under the condition of the application suggests that the tests retain the nominal level and have good power.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106194"},"PeriodicalIF":0.9,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S037837582400051X/pdfft?md5=a5bc737bb68bd11a1a31f4aeb333c40e&pid=1-s2.0-S037837582400051X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141240222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}