{"title":"Computationally efficient spatio-temporal disease mapping for big data","authors":"Duncan Lee","doi":"10.1016/j.spasta.2025.100901","DOIUrl":null,"url":null,"abstract":"<div><div>Disease mapping models estimate the spatio-temporal variation in population-level disease risks or rates across a set of <span><math><mi>K</mi></math></span> areal units for <span><math><mi>N</mi></math></span> time periods, aiming to identify temporal trends and spatial hotspots. Highly parameterised Bayesian hierarchical models with over <span><math><mrow><mi>K</mi><mi>N</mi></mrow></math></span> random effects are commonly used to estimate this spatio-temporal variation, which are assigned autoregressive and conditional autoregressive prior distributions. These models work well when there are tens of thousands of data points, but are likely to be computationally burdensome when this rises to hundreds of thousands or above. This paper proposes a computationally efficient alternative, which can fit a range of spatio-temporal disease trends almost as well as existing highly parameterised models but only takes around 5% to 40% of the time to implement. It achieves this by modelling the average spatial and temporal trends in the data with autoregressive type random effects, which are augmented by an observation-driven process using functions of earlier data as additional covariates in the model. The efficacy of this methodology is tested by simulation, before being applied to the motivating study that estimates the spatio-temporal trends in asthma, cancer, coronary heart and chronic obstructive pulmonary disease prevalences for <span><math><mrow><mi>K</mi><mo>=</mo><mn>32</mn><mo>,</mo><mn>751</mn></mrow></math></span> small areas over <span><math><mrow><mi>N</mi><mo>=</mo><mn>13</mn></mrow></math></span> years in England.</div></div>","PeriodicalId":48771,"journal":{"name":"Spatial Statistics","volume":"67 ","pages":"Article 100901"},"PeriodicalIF":2.1000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spatial Statistics","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211675325000235","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Disease mapping models estimate the spatio-temporal variation in population-level disease risks or rates across a set of areal units for time periods, aiming to identify temporal trends and spatial hotspots. Highly parameterised Bayesian hierarchical models with over random effects are commonly used to estimate this spatio-temporal variation, which are assigned autoregressive and conditional autoregressive prior distributions. These models work well when there are tens of thousands of data points, but are likely to be computationally burdensome when this rises to hundreds of thousands or above. This paper proposes a computationally efficient alternative, which can fit a range of spatio-temporal disease trends almost as well as existing highly parameterised models but only takes around 5% to 40% of the time to implement. It achieves this by modelling the average spatial and temporal trends in the data with autoregressive type random effects, which are augmented by an observation-driven process using functions of earlier data as additional covariates in the model. The efficacy of this methodology is tested by simulation, before being applied to the motivating study that estimates the spatio-temporal trends in asthma, cancer, coronary heart and chronic obstructive pulmonary disease prevalences for small areas over years in England.
期刊介绍:
Spatial Statistics publishes articles on the theory and application of spatial and spatio-temporal statistics. It favours manuscripts that present theory generated by new applications, or in which new theory is applied to an important practical case. A purely theoretical study will only rarely be accepted. Pure case studies without methodological development are not acceptable for publication.
Spatial statistics concerns the quantitative analysis of spatial and spatio-temporal data, including their statistical dependencies, accuracy and uncertainties. Methodology for spatial statistics is typically found in probability theory, stochastic modelling and mathematical statistics as well as in information science. Spatial statistics is used in mapping, assessing spatial data quality, sampling design optimisation, modelling of dependence structures, and drawing of valid inference from a limited set of spatio-temporal data.