L. Rimella, S. Alderton, M. Sammarro, B. Rowlingson, D. Cocker, N. Feasey, P. Fearnhead, C. Jewell
{"title":"Inference on extended-spectrum beta-lactamase Escherichia coli and Klebsiella pneumoniae data through SMC2","authors":"L. Rimella, S. Alderton, M. Sammarro, B. Rowlingson, D. Cocker, N. Feasey, P. Fearnhead, C. Jewell","doi":"10.1093/jrsssc/qlad055","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad055","url":null,"abstract":"\u0000 We propose a novel stochastic model for the spread of antimicrobial-resistant bacteria in a population, together with an efficient algorithm for fitting such a model to sample data. We introduce an individual-based model for the epidemic, with the state of the model determining which individuals are colonised by the bacteria. The transmission rate of the epidemic takes into account both individuals’ locations, individuals’ covariates, seasonality, and environmental effects. The state of our model is only partially observed, with data consisting of test results from individuals from a sample of households. Fitting our model to data is challenging due to the large state space of our model. We develop an efficient SMC2 algorithm to estimate parameters and compare models for the transmission rate. We implement this algorithm in a computationally efficient manner by using the scale invariance properties of the underlying epidemic model. Our motivating application focuses on the dynamics of community-acquired extended-spectrum beta-lactamase-producing Escherichia coli and Klebsiella pneumoniae, using data collected as part of the Drivers of Resistance in Uganda and Malawi project. We infer the parameters of the model and learn key epidemic quantities such as the effective reproduction number, spatial distribution of prevalence, household cluster dynamics, and seasonality.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"5 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86122983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shen-Ming Lee, Truong-Nhat Le, Phuoc-Loc Tran, Chin-Shang Li
{"title":"Investigating the association of a sensitive attribute with a random variable using the Christofides generalised randomised response design and Bayesian methods","authors":"Shen-Ming Lee, Truong-Nhat Le, Phuoc-Loc Tran, Chin-Shang Li","doi":"10.1111/rssc.12585","DOIUrl":"10.1111/rssc.12585","url":null,"abstract":"<p>In empirical studies involving sensitive topics, in addition to the problem of estimating the population proportion with a sensitive characteristic, a question arises as to whether or not there is heterogeneity in the distribution of an auxiliary random variable representing the information of subjects collected from a sensitive group and a non-sensitive group. That is, it is of interest to investigate the influence of sensitive attribute on the auxiliary random variable of interest. Finite mixture models are utilised to evaluate the association. A proposed Bayesian method through data augmentation and Markov chain Monte Carlo is applied to estimate unknown parameters of interest. Deviance information criterion and marginal likelihood are employed to select a suitable model to describe the association of the sensitive characteristic with the auxiliary random variable. Simulation and real data studies are conducted to assess the performance of and illustrate applications of the proposed methodology.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1471-1502"},"PeriodicalIF":1.6,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88884368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Said el Bouhaddani, Hae-Won Uh, Geurt Jongbloed, Jeanine Houwing-Duistermaat
{"title":"Statistical integration of heterogeneous omics data: Probabilistic two-way partial least squares (PO2PLS)","authors":"Said el Bouhaddani, Hae-Won Uh, Geurt Jongbloed, Jeanine Houwing-Duistermaat","doi":"10.1111/rssc.12583","DOIUrl":"10.1111/rssc.12583","url":null,"abstract":"<p>The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), that addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we propose a novel fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for the relationship between two datasets is proposed, specifically addressing the high dimensionality, and its asymptotic distribution is derived. Notably, several existing data integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case–control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package <i>PO2PLS</i>.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1451-1470"},"PeriodicalIF":1.6,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12583","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74773208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modelling time-varying rankings with autoregressive and score-driven dynamics","authors":"Vladimír Holý, Jan Zouhar","doi":"10.1111/rssc.12584","DOIUrl":"10.1111/rssc.12584","url":null,"abstract":"<p>We develop a new statistical model to analyse time-varying ranking data. The model can be used with a large number of ranked items, accommodates exogenous time-varying covariates and partial rankings, and is estimated via the maximum likelihood in a straightforward manner. Rankings are modelled using the Plackett–Luce distribution with time-varying worth parameters that follow a mean-reverting time series process. To capture the dependence of the worth parameters on past rankings, we utilise the conditional score in the fashion of the generalised autoregressive score models. Simulation experiments show that the small-sample properties of the maximum-likelihood estimator improve rapidly with the length of the time series and suggest that statistical inference relying on conventional Hessian-based standard errors is usable even for medium-sized samples. In an empirical study, we apply the model to the results of the Ice Hockey World Championships. We also discuss applications to rankings based on underlying indices, repeated surveys and non-parametric efficiency analysis.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1427-1450"},"PeriodicalIF":1.6,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83166449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Owen G. Ward, Jing Wu, Tian Zheng, Anna L. Smith, James P. Curley
{"title":"Network Hawkes process models for exploring latent hierarchy in social animal interactions","authors":"Owen G. Ward, Jing Wu, Tian Zheng, Anna L. Smith, James P. Curley","doi":"10.1111/rssc.12581","DOIUrl":"10.1111/rssc.12581","url":null,"abstract":"<p>Group-based social dominance hierarchies are of essential interest in understanding social structure (DeDeo & Hobson in, Proceedings of the National Academy of Sciences 118(21), 2021). Recent animal behaviour research studies can record aggressive interactions observed over time. Models that can explore the underlying hierarchy from the observed temporal dynamics in behaviours are therefore crucial. Traditional ranking methods aggregate interactions across time into win/loss counts, equalizing dynamic interactions with the underlying hierarchy. Although these models have gleaned important behavioural insights from such data, they are limited in addressing many important questions that remain unresolved. In this paper, we take advantage of the observed interactions' timestamps, proposing a series of network point process models with latent ranks. We carefully design these models to incorporate important theories on animal behaviour that account for dynamic patterns observed in the interaction data, including the winner effect, bursting and pair-flip phenomena. Through iteratively constructing and evaluating these models we arrive at the final cohort Markov-modulated Hawkes process (C-MMHP), which best characterizes all aforementioned patterns observed in interaction data. As such, inference on our model components can be readily interpreted in terms of theories on animal behaviours. The probabilistic nature of our model allows us to estimate the uncertainty in our ranking. In particular, our model is able to provide insights into the distribution of power within the hierarchy which forms and the strength of the established hierarchy. We compare all models using simulated and real data. Using statistically developed diagnostic perspectives, we demonstrate that the C-MMHP model outperforms other methods, capturing relevant latent ranking structures that lead to meaningful predictions for real data.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1402-1426"},"PeriodicalIF":1.6,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82071302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Riani, Anthony C. Atkinson, Francesca Torti, Aldo Corbellini
{"title":"Robust correspondence analysis","authors":"Marco Riani, Anthony C. Atkinson, Francesca Torti, Aldo Corbellini","doi":"10.1111/rssc.12580","DOIUrl":"10.1111/rssc.12580","url":null,"abstract":"<p>Correspondence analysis is a method for the visual display of information from two-way contingency tables. We introduce a robust form of correspondence analysis based on minimum covariance determinant estimation. This leads to the systematic deletion of outlying rows of the table and to plots of greatly increased informativeness. Our examples are trade flows of clothes and consumer evaluations of the perceived properties of cars. The robust method requires that a specified proportion of the data be used in fitting. To accommodate this requirement we provide an algorithm that uses a subset of complete rows and one row partially, both sets of rows being chosen robustly. We prove the convergence of this algorithm.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1381-1401"},"PeriodicalIF":1.6,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12580","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82808130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatiotemporal ETAS model with a renewal main-shock arrival process","authors":"Tom Stindl, Feng Chen","doi":"10.1111/rssc.12579","DOIUrl":"10.1111/rssc.12579","url":null,"abstract":"<p>We propose a spatiotemporal point process model that enhances the classical Epidemic-Type Aftershock Sequence (ETAS) model. This is achieved with the introduction of a renewal main-shock arrival process and we call this extension the renewal ETAS (RETAS) model. This modification is similar in spirit to the renewal Hawkes (RHawkes) process but the conditional intensity process supports a spatial component. It empowers the main-shock intensity to reset upon the arrival of main-shocks. This allows for heavier clustering of main-shocks than the classical spatiotemporal ETAS model. We introduce a likelihood evaluation algorithm for parameter estimation and provide a novel procedure to evaluate the fitted model's goodness-of-fit (GOF) based on a sequential application of the Rosenblatt transformation. A simulation algorithm for the RETAS model is outlined and used to validate the numerical performance of the likelihood evaluation algorithm and GOF test procedure. We illustrate the proposed model and methods on various earthquake catalogues around the world each with distinctly different seismic activity. These catalogues demonstrate the RETAS model's additional flexibility in comparison to the classical spatiotemporal ETAS model and emphasizes the potential for superior modelling and forecasting of seismicity.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1356-1380"},"PeriodicalIF":1.6,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12579","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79644802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Specification analysis for technology use and teenager well-being: Statistical validity and a Bayesian proposal","authors":"Christoph Semken, David Rossell","doi":"10.1111/rssc.12578","DOIUrl":"10.1111/rssc.12578","url":null,"abstract":"A key issue in science is assessing robustness to data analysis choices, while avoiding selective reporting and providing valid inference. Specification Curve Analysis is a tool intended to prevent selective reporting. Alas, when used for inference it can create severe biases and false positives, due to wrongly adjusting for covariates, and mask important treatment effect heterogeneity. As our motivating application, it led an influential study to conclude there is no relevant association between technology use and teenager mental well‐being. We discuss these issues and propose a strategy for valid inference. Bayesian Specification Curve Analysis (BSCA) uses Bayesian Model Averaging to incorporate covariates and heterogeneous effects across treatments, outcomes and subpopulations. BSCA gives significantly different insights into teenager well‐being, revealing that the association with technology differs by device, gender and who assesses well‐being (teenagers or their parents).","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1330-1355"},"PeriodicalIF":1.6,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12578","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83476610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Stival, M. Bernardi, Manuela Cattelan, P. Dellaportas
{"title":"Missing data patterns in runners’ careers: do they matter?","authors":"M. Stival, M. Bernardi, Manuela Cattelan, P. Dellaportas","doi":"10.1093/jrsssc/qlad009","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad009","url":null,"abstract":"\u0000 Predicting the future performance of young runners is an important research issue in experimental sports science and performance analysis. We analyse a dataset with annual seasonal best performances of male middle distance runners for a period of 14 years and provide a modelling framework that accounts for both the fact that each runner has typically run in 3 distance events (800, 1,500, and 5,000 m) and the presence of periods of no running activities. We propose a latent class matrix-variate state space model and we empirically demonstrate that accounting for missing data patterns in runners’ careers improves the out of sample prediction of their performances over time. In particular, we demonstrate that for this analysis, the missing data patterns provide valuable information for the prediction of runner’s performance.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"6 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80455942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeneous graphical model for non-negative and non-Gaussian \u0000 \u0000 \u0000 PM\u0000 2.5\u0000 \u0000 data","authors":"Jiaqi Zhang, Xinyan Fan, Yang Li, Shuangge Ma","doi":"10.1111/rssc.12575","DOIUrl":"10.1111/rssc.12575","url":null,"abstract":"<p>Studies on the conditional relationships between \u0000<math>\u0000 <mrow>\u0000 <msub>\u0000 <mtext>PM</mtext>\u0000 <mn>2.5</mn>\u0000 </msub>\u0000 </mrow></math> concentrations among different regions are of great interest for the joint prevention and control of air pollution. Because of seasonal changes in atmospheric conditions, spatial patterns of \u0000<math>\u0000 <mrow>\u0000 <msub>\u0000 <mtext>PM</mtext>\u0000 <mn>2.5</mn>\u0000 </msub>\u0000 </mrow></math> may differ throughout the year. Additionally, concentration data are both non-negative and non-Gaussian. These data features pose significant challenges to existing methods. This study proposes a heterogeneous graphical model for non-negative and non-Gaussian data via the score matching loss. The proposed method simultaneously clusters multiple datasets and estimates a graph for variables with complex properties in each cluster. Furthermore, our model involves a network that indicate similarity among datasets, and this network can have additional applications. In simulation studies, the proposed method outperforms competing alternatives in both clustering and edge identification. We also analyse the \u0000<math>\u0000 <mrow>\u0000 <msub>\u0000 <mtext>PM</mtext>\u0000 <mn>2.5</mn>\u0000 </msub>\u0000 </mrow></math> concentrations' spatial correlations in Taiwan's regions using data obtained in year 2019 from 67 air-quality monitoring stations. The 12 months are clustered into four groups: January–March, April, May–September and October–December, and the corresponding graphs have 153, 57, 86 and 167 edges respectively. The results show obvious seasonality, which is consistent with the meteorological literature. Geographically, the \u0000<math>\u0000 <mrow>\u0000 <msub>\u0000 <mtext>PM</mtext>\u0000 <mn>2.5</mn>\u0000 </msub>\u0000 </mrow></math> concentrations of north and south Taiwan regions correlate more respectively. These results can provide valuable information for developing joint air-quality control strategies.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1303-1329"},"PeriodicalIF":1.6,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82229817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}