{"title":"Exact confidence intervals for the difference of two proportions based on partially observed binary data","authors":"Chongxiu Yu, Weizhen Wang, Zhongzhan Zhang","doi":"10.1002/sta4.631","DOIUrl":"https://doi.org/10.1002/sta4.631","url":null,"abstract":"In a matched pairs experiment, two binary variables are typically observed on all subjects in the experiment. However, when one of the variables is missing on some subjects, we have so called the partially observed binary data that consist of two parts: a multinomial from the subjects with a pair of observed variables and two independent binomials from the subjects with only one observed variable. The goal of this paper is to construct exact confidence intervals for the difference of two (success) proportions of the two binary variables. We first derive a new test by combining two score tests for the two parts of the data and invert it to an asymptotic confidence interval. Since asymptotic intervals do not achieve the nominal level, this interval and three other existing intervals are improved to be exact by the general <math altimg=\"urn:x-wiley:sta4:media:sta4631:sta4631-math-0001\" display=\"inline\" location=\"graphic/sta4631-math-0001.png\">\u0000<semantics>\u0000<mrow>\u0000<mi>h</mi>\u0000</mrow>\u0000$$ h $$</annotation>\u0000</semantics></math>-function method. We compare the infimum coverage probability and average interval length of these intervals and recommend the exact intervals that are improved from the newly proposed interval. Two real data sets are used to illustrate the intervals.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"10 9","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanvesh Srivastava, Zongyi Xu, Yunyi Li, W. Nick Street, Stephanie Gilbertson-White
{"title":"Gaussian process regression and classification using International Classification of Disease codes as covariates","authors":"Sanvesh Srivastava, Zongyi Xu, Yunyi Li, W. Nick Street, Stephanie Gilbertson-White","doi":"10.1002/sta4.618","DOIUrl":"https://doi.org/10.1002/sta4.618","url":null,"abstract":"In electronic health records (EHRs) data analysis, nonparametric regression and classification using International Classification of Disease (ICD) codes as covariates remain understudied. Automated methods have been developed over the years for predicting biomedical responses using EHRs, but relatively less attention has been paid to developing patient similarity measures that use ICD codes and chronic conditions, where a chronic condition is defined as a set of ICD codes. We address this problem by first developing a string kernel function for measuring the similarity between a pair of primary chronic conditions, represented as subsets of ICD codes. Second, we extend this similarity measure to a family of covariance functions on subsets of chronic conditions. This family is used in developing Gaussian process (GP) priors for Bayesian nonparametric regression and classification using diagnoses and other demographic information as covariates. Markov chain Monte Carlo (MCMC) algorithms are used for posterior inference and predictions. The proposed methods are tuning free, so they are ideal for automated prediction of biomedical responses depending on chronic conditions. We evaluate the practical performance of our method on EHR data collected from 1660 patients at the University of Iowa Hospitals and Clinics (UIHC) with six different primary cancer sites. Our method provides better sensitivity and specificity than its competitors in classifying different primary cancer sites and estimates the marginal associations between chronic conditions and primary cancer sites.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"27 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel Valiquette, Gwladys Toulemonde, Jean Peyhardi, Éric Marchand, Frédéric Mortier
{"title":"Asymptotic tail properties of Poisson mixture distributions","authors":"Samuel Valiquette, Gwladys Toulemonde, Jean Peyhardi, Éric Marchand, Frédéric Mortier","doi":"10.1002/sta4.622","DOIUrl":"https://doi.org/10.1002/sta4.622","url":null,"abstract":"Count data are omnipresent in many applied fields, often with overdispersion. With mixtures of Poisson distributions representing an elegant and appealing modelling strategy, we focus here on how the tail behaviour of the mixing distribution is related to the tail of the resulting Poisson mixture. We define five sets of mixing distributions, and we identify for each case whenever the Poisson mixture is in, close to or far from a domain of attraction of maxima. We also characterize how the Poisson mixture behaves similarly to a standard Poisson distribution when the mixing distribution has a finite support. Finally, we study, both analytically and numerically, how goodness‐of‐fit can be assessed with the inspection of tail behaviour.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"5 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juho Timonen, Nikolas Siccha, Ben Bales, Harri Lähdesmäki, Aki Vehtari
{"title":"An importance sampling approach for reliable and efficient inference in Bayesian ordinary differential equation models","authors":"Juho Timonen, Nikolas Siccha, Ben Bales, Harri Lähdesmäki, Aki Vehtari","doi":"10.1002/sta4.614","DOIUrl":"https://doi.org/10.1002/sta4.614","url":null,"abstract":"Statistical models can involve implicitly defined quantities, such as solutions to nonlinear ordinary differential equations (ODEs), that unavoidably need to be numerically approximated in order to evaluate the model. The approximation error inherently biases statistical inference results, but the amount of this bias is generally unknown and often ignored in Bayesian parameter inference. We propose a computationally efficient method for verifying the reliability of posterior inference for such models, when the inference is performed using Markov chain Monte Carlo methods. We validate the efficiency and reliability of our workflow in experiments using simulated and real data and different ODE solvers. We highlight problems that arise with commonly used adaptive ODE solvers and propose robust and effective alternatives, which, accompanied by our workflow, can now be taken into use without losing reliability of the inferences.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135149145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A modified partial envelope tensor response regression","authors":"Wenxing Guo, Narayanaswamy Balakrishnan, Shanshan Qin","doi":"10.1002/sta4.615","DOIUrl":"https://doi.org/10.1002/sta4.615","url":null,"abstract":"The envelope model is a useful statistical technique that can be applied to multivariate linear regression problems. It aims to remove immaterial information via sufficient dimension reduction techniques while still gaining efficiency and providing accurate parameter estimates. Recently, envelope tensor versions have been developed to extend this technique to tensor data. In this work, a partial tensor envelope model is proposed that allows for a parsimonious version of tensor response regression when only certain predictors are of interest. The consistency and asymptotic normality of the regression coefficients estimator are also established theoretically, which provides a rigorous foundation for the proposed method. In numerical studies using both simulated and real‐world data, the partial tensor envelope model is shown to outperform several existing methods in terms of the efficiency of the regression coefficients associated with the selected predictors.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135878270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New penalty in information criteria for the ARCH sequence with structural changes","authors":"Ryoto Ozaki, Yoshiyuki Ninomiya","doi":"10.1002/sta4.612","DOIUrl":"https://doi.org/10.1002/sta4.612","url":null,"abstract":"For change point models and autoregressive conditional heteroscedasticity (ARCH) models, which have long been important especially in econometrics, we develop information criteria that work well even when considering a combination of these models. Since the change point model does not satisfy the conventional statistical asymptotics, a formal Akaike information criterion (AIC) with twice the number of parameters as the penalty term would clearly result in overfitting. Therefore, we derive an AIC‐type information criterion from its original definition using asymptotics peculiar to the change point model. Specifically, we suppose time series data treated in econometrics and derive Takeuchi information criterion (TIC) as the main information criterion allowing for model misspecification. It is confirmed that the penalty for the change point parameter is almost three times larger than the penalty for the regular parameter. We also derive the AIC in this setting from the TIC by removing the consideration of the model misspecification. In numerical experiments, the derived TIC and AIC are compared with the formal AIC and Bayesian information criterion (BIC). It is shown that the derived information criteria clearly outperform the others in light of the original purpose of AIC, which is to give an estimate close to the true structure. We also ensure that the TIC seems to be superior to the AIC in the presence of model misspecification.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136071436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On pairwise interaction multivariate Pareto models","authors":"Michaël Lalancette","doi":"10.1002/sta4.613","DOIUrl":"https://doi.org/10.1002/sta4.613","url":null,"abstract":"The rich class of multivariate Pareto distributions forms the basis of recently introduced extremal graphical models. However, most existing literature on the topic is focused on the popular parametric family of Hüsler–Reiss distributions. It is shown that the Hüsler–Reiss family is in fact the only continuous multivariate Pareto model that exhibits the structure of a pairwise interaction model, justifying its use in many high‐dimensional problems. Along the way, useful insight is obtained concerning a certain class of distributions that generalize the Hüsler–Reiss family, a result of independent interest.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136072664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aqi Dong, Volodymyr Melnykov, Yang Wang, Xuwen Zhu
{"title":"Conditional mixture modelling for heavy‐tailed and skewed data","authors":"Aqi Dong, Volodymyr Melnykov, Yang Wang, Xuwen Zhu","doi":"10.1002/sta4.608","DOIUrl":"https://doi.org/10.1002/sta4.608","url":null,"abstract":"Overparameterization is a serious concern for multivariate mixture models as it can lead to model overfitting and, as a result, mixture order underestimation. Parsimonious modelling is one of the most effective remedies in this context. In Gaussian mixture models, the majority of parameters is associated with covariance matrices and parsimonious models based on factor analysers and spectral decomposition of dispersion parameters are the most popular in literature. Some drawbacks of these models include the lack of flexibility in imposing different covariance structures for individual components and limitations in modelling compact clusters. Recently introduced conditional mixture models provide substantial flexibility in addressing these concerns. The components of such mixtures are formulated as a product of conditional distributions with univariate Gaussian densities being the primary choice. However, the presence of heavy tails or skewness in any dimension can lead to fitting problems. We propose a flexible model that is free of the above‐mentioned limitations and name it a contaminated transformation conditional mixture model and demonstrate on a series of simulation studies that it can effectively account for skewness and heavy tails. Applications to real‐life data sets show good results and highlight the promise of the proposed model.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"48 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90717289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Likelihood‐based inference for linear mixed‐effects models using the generalized hyperbolic distribution","authors":"V. H. Lachos, M. Galea, C. Zeller, M. Prates","doi":"10.1002/sta4.602","DOIUrl":"https://doi.org/10.1002/sta4.602","url":null,"abstract":"In this paper, we develop statistical methodology for the analysis of data under nonnormal distributions, in the context of mixed effects models. Although the multivariate normal distribution is useful in many cases, it is not appropriate, for instance, when the data come from skewed and/or heavy‐tailed distributions. To analyse data with these characteristics, in this paper, we extend the standard linear mixed effects model, considering the family of generalized hyperbolic distributions. We propose methods for statistical inference based on the likelihood function, and due to its complexity, the EM algorithm is used to find the maximum likelihood estimates with the standard errors and the exact likelihood value as a by‐product. We use simulations to investigate the asymptotic properties of the expectation‐maximization algorithm (EM) estimates and prediction accuracy. A real example is analysed, illustrating the usefulness of the proposed methods.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"152 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85390536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proposed variable sampling interval maximum EWMA and distance EWMA charts with unknown process parameters","authors":"R. Parvin, M. Khoo, S. Saha, W. L. Teoh","doi":"10.1002/sta4.605","DOIUrl":"https://doi.org/10.1002/sta4.605","url":null,"abstract":"The variable sampling interval (VSI) exponentially weighted moving average (EWMA) chart which varies the chart's sampling interval according to the value of the current plotting statistic increases the speed of the standard EWMA chart in detecting shifts. Joint monitoring schemes use a single combined statistic for the mean and variance in process monitoring. To simultaneously monitor the mean and variance of a process from the normal distribution, two VSI EWMA schemes with unknown process parameters, based on (i) Maximum (Max) and (ii) Distance (Dis) type combining functions, are proposed in this paper. Each of these schemes uses a single plotting statistic. The effects of parameter estimation on the performance of the proposed VSI Max EWMA and VSI Dis EWMA schemes, in terms of the average time to signal, standard deviation of the time to signal, expected average time to signal and median time to signal criteria, are studied using Monte Carlo simulation. The results show that the proposed schemes can identify process shifts quicker than the existing Max/Dis Shewhart (SH), Max/Dis cumulative sum (CUSUM) and Max/Dis EWMA schemes. The implementation of the proposed schemes is demonstrated using a commercial dataset.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"102 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81782833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}