Statistical ModellingPub Date : 2021-12-01Epub Date: 2020-08-21DOI: 10.1177/1471082x20930894
M Menictas, T H Nolan, D G Simpson, M P Wand
{"title":"Streamlined variational inference for higher level group-specific curve models.","authors":"M Menictas, T H Nolan, D G Simpson, M P Wand","doi":"10.1177/1471082x20930894","DOIUrl":"https://doi.org/10.1177/1471082x20930894","url":null,"abstract":"<p><p>A two-level group-specific curve model is such that the mean response of each member of a group is a separate smooth function of a predictor of interest. The three-level extension is such that one grouping variable is nested within another one, and higher level extensions are analogous. Streamlined variational inference for higher level group-specific curve models is a challenging problem. We confront it by systematically working through two-level and then three-level cases and making use of the higher level sparse matrix infrastructure laid down in Nolan and Wand (2019). A motivation is analysis of data from ultrasound technology for which three-level group-specific curve models are appropriate. Whilst extension to the number of levels exceeding three is not covered explicitly, the pattern established by our systematic approach sheds light on what is required for even higher level group-specific curve models.</p>","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082x20930894","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39913169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reflections on statistical modelling: A conversation with Murray Aitkin","authors":"M. Aitkin, J. Hinde, Brian Francis","doi":"10.1177/1471082X211060560","DOIUrl":"https://doi.org/10.1177/1471082X211060560","url":null,"abstract":"A virtual interview with Murray Aitkin by Brian Francis and John Hinde, two of the original members of the Centre for Applied Statistics that Murray created at Lancaster University. The talk ranges over Murray's reflections of a career in statistical modelling and the many different collaborations across the world that have been such a significant part of it.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48389187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian analysis of two-part nonlinear latent variable model: Semiparametric method","authors":"Jianwei Gou, Ye-mao Xia, De-Peng Jiang","doi":"10.1177/1471082X211059233","DOIUrl":"https://doi.org/10.1177/1471082X211059233","url":null,"abstract":"Two-part model (TPM) is a widely appreciated statistical method for analyzing semi-continuous data. Semi-continuous data can be viewed as arising from two distinct stochastic processes: one governs the occurrence or binary part of data and the other determines the intensity or continuous part. In the regression setting with the semi-continuous outcome as functions of covariates, the binary part is commonly modelled via logistic regression and the continuous component via a log-normal model. The conventional TPM, still imposes assumptions such as log-normal distribution of the continuous part, with no unobserved heterogeneity among the response, and no collinearity among covariates, which are quite often unrealistic in practical applications. In this article, we develop a two-part nonlinear latent variable model (TPNLVM) with mixed multiple semi-continuous and continuous variables. The semi-continuous variables are treated as indicators of the latent factor analysis along with other manifest variables. This reduces the dimensionality of the regression model and alleviates the potential multicollinearity problems. Our TPNLVM can accommodate the nonlinear relationships among latent variables extracted from the factor analysis. To downweight the influence of distribution deviations and extreme observations, we develop a Bayesian semiparametric analysis procedure. The conventional parametric assumptions on the related distributions are relaxed and the Dirichlet process (DP) prior is used to improve model fitting. By taking advantage of the discreteness of DP, our method is effective in capturing the heterogeneity underlying population. Within the Bayesian paradigm, posterior inferences including parameters estimates and model assessment are carried out through Markov Chains Monte Carlo (MCMC) sampling method. To facilitate posterior sampling, we adapt the Polya-Gamma stochastic representation for the logistic model. Using simulation studies, we examine properties and merits of our proposed methods and illustrate our approach by evaluating the effect of treatment on cocaine use and examining whether the treatment effect is moderated by psychiatric problems.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47521798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-parameter regression survival modelling with random effects","authors":"Fatima-Zahra Jaouimaa, I. Ha, Kevin Burke","doi":"10.1177/1471082x221117377","DOIUrl":"https://doi.org/10.1177/1471082x221117377","url":null,"abstract":"We consider a parametric modelling approach for survival data where covariates are allowed to enter the model through multiple distributional parameters (i.e., scale and shape). This is in contrast with the standard convention of having a single covariate-dependent parameter, typically the scale. Taking what is referred to as a multi-parameter regression (MPR) approach to modelling has been shown to produce flexible and robust models with relatively low model complexity cost. However, it is very common to have clustered data arising from survival analysis studies, and this is something that is under developed in the MPR context. The purpose of this article is to extend MPR models to handle multivariate survival data by introducing random effects in both the scale and the shape regression components. We consider a variety of possible dependence structures for these random effects (independent, shared and correlated), and estimation proceeds using a h-likelihood approach. The performance of our estimation procedure is investigated by a way of an extensive simulation study, and the merits of our modelling approach are illustrated through applications to two real data examples, a lung cancer dataset and a bladder cancer dataset.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48196771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A joint transition model for evaluating eGFR as biomarker for rejection after kidney transplantation","authors":"M. Coemans, G. Verbeke, M. Naesens","doi":"10.1177/1471082X211048695","DOIUrl":"https://doi.org/10.1177/1471082X211048695","url":null,"abstract":"The estimated glomerular filtration rate (eGFR) quantifies kidney graft function and is measured repeatedly after transplantation. Kidney graft rejection is diagnosed by performing biopsies on a regular basis (protocol biopsies at time of stable eGFR) or by performing biopsies due to clinical cause (indication biopsies at time of declining eGFR). The diagnostic value of the eGFR evolution as biomarker for rejection is not well established. To this end, we built a joint model which combines characteristics of transition models and shared parameter models to carry over information from one biopsy to the next, taking into account the longitudinal information of eGFR collected in between. From our model, applied to data of University Hospitals Leuven (870 transplantations, 2 635 biopsies), we conclude that a negative deviation from the mean eGFR slope increases the probability of rejection in indication biopsies, but that, on top of the biopsy history, there is little benefit in using the eGFR profile for diagnosing rejection. Methodologically, our model fills a gap in the biomarker literature by relating a frequently (repeatedly) measured continuous outcome with a less frequently (repeatedly) measured binary indicator. The developed joint transition model is flexible and applicable to multiple other research settings.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46812969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrigendum to Poisson–Tweedie mixed-effects model: A flexible approach for the analysis of longitudinal RNA-seq data","authors":"","doi":"10.1177/1471082x211014368","DOIUrl":"https://doi.org/10.1177/1471082x211014368","url":null,"abstract":"“Poisson–Tweedie mixed-effects model: A flexible approach for the analysis of longitudinal RNA-seq data” by Mirko Signorelli, Pietro Spitali and Roula Tsonaka was published in Statistical Modelling, Onlinefirst 24 August 2020, DOI: 10.1177/1471082X20936017. The authors have recently identified two mistakes in the R code that they used to estimate the Poisson-Tweedie mixed model (ptmixed) in simulations C and D, whose results are presented in Section 3.3 of the OnlineFirst version of the article. Therefore, they have proceeded to rerun such simulations with the corrected code, and to update the results of Section 3.3 accordingly. The amended results of simulations C and D will be published in the onlinefirst version of the article and the subsequent issue in which it is published.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082x211014368","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49223831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Outlier accommodation with semiparametric density processes: A study of Antarctic snow density modelling","authors":"Daniel Sheanshang, P. White, D. Keeler","doi":"10.1177/1471082X211043946","DOIUrl":"https://doi.org/10.1177/1471082X211043946","url":null,"abstract":"In many settings, data acquisition generates outliers that can obscure inference. Therefore, practitioners often either identify and remove outliers or accommodate outliers using robust models. However, identifying and removing outliers is often an ad hoc process that affects inference, and robust methods are often too simple for some applications. In our motivating application, scientists drill snow cores and measure snow density to infer densification rates that aid in estimating snow water accumulation rates and glacier mass balances. Advanced measurement techniques can measure density at high resolution over depth but are sensitive to core imperfections, making them prone to outliers. Outlier accommodation is challenging in this setting because the distribution of outliers evolves over depth and the data demonstrate natural heteroscedasticity. To address these challenges, we present a two-component mixture model using a physically motivated snow density model and an outlier model, both of which evolve over depth. The physical component of the mixture model has a mean function with normally distributed depth-dependent heteroscedastic errors. The outlier component is specified using a semiparametric prior density process constructed through a normalized process convolution of log-normal random variables. We demonstrate that this model outperforms alternatives and can be used for various inferential tasks.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43671214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Bayesian framework for modelling the preferential selection process in respondent-driven sampling","authors":"Katherine R. McLaughlin","doi":"10.1177/1471082X211043945","DOIUrl":"https://doi.org/10.1177/1471082X211043945","url":null,"abstract":"In sampling designs that utilize peer recruitment, the sampling process is partially unknown and must be modelled to make inference about the population and estimate standard outcomes like prevalence. We develop a Bayesian model for the recruitment process for respondent-driven sampling (RDS), a network sampling methodology used worldwide to sample hidden populations that are not reachable by conventional sampling techniques, including those at high risk for HIV/AIDS. Current models for the RDS sampling process typically assume that recruitment occurs randomly given the population social network, but this is likely untrue in practice. To model preferential selection on covariates, we develop a sequential two-sided rational choice framework, which allows generative probabilistic network models to be created for the RDS sampling process. In the rational choice framework, members of the population make recruitment and participation choices based on observable nodal and dyadic covariates to maximize their utility given constraints. Inference is made about recruitment preferences given the observed recruitment chain in a Bayesian framework by incorporating the latent utilities and sampling from the joint posterior distribution via Markov chain Monte Carlo. We present simulation results and apply the model to an RDS study of Francophone migrants in Rabat, Morocco.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46175653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mixture models and networks: The stochastic blockmodel","authors":"G. De Nicola, Benjamin Sischka, G. Kauermann","doi":"10.1177/1471082X211033169","DOIUrl":"https://doi.org/10.1177/1471082X211033169","url":null,"abstract":"Mixture models are probabilistic models aimed at uncovering and representing latent subgroups within a population. In the realm of network data analysis, the latent subgroups of nodes are typically identified by their connectivity behaviour, with nodes behaving similarly belonging to the same community. In this context, mixture modelling is pursued through stochastic blockmodelling. We consider stochastic blockmodels and some of their variants and extensions from a mixture modelling perspective. We also explore some of the main classes of estimation methods available and propose an alternative approach based on the reformulation of the blockmodel as a graphon. In addition to the discussion of inferential properties and estimating procedures, we focus on the application of the models to several real-world network datasets, showcasing the advantages and pitfalls of different approaches.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42647312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modelling agreement for binary intensive longitudinal data","authors":"S. Vanbelle, E. Lesaffre","doi":"10.1177/1471082X211034002","DOIUrl":"https://doi.org/10.1177/1471082X211034002","url":null,"abstract":"Devices that measure our physical, medical and mental condition have entered our daily life recently. Such devices measure our status in a continuous manner and can be useful in predicting future medical events or can guide us towards a healthier life. It is therefore important to establish that such devices record our behaviour in a reliable manner and measure what we believe they measure. In this article, we propose to measure the reliability and validity of a newly developed measuring device in time using a longitudinal model for sequential kappa statistics. We propose a Bayesian estimation procedure. The method is illustrated by a validation study of a new accelerometer in cardiopulmonary rehabilitation patients.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46812004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}