{"title":"将霍克斯过程扩展到一百万个 COVID-19 病例","authors":"Seyoon Ko, Marc A. Suchard, Andrew J. Holbrook","doi":"arxiv-2407.11349","DOIUrl":null,"url":null,"abstract":"Hawkes stochastic point process models have emerged as valuable statistical\ntools for analyzing viral contagion. The spatiotemporal Hawkes process\ncharacterizes the speeds at which viruses spread within human populations.\nUnfortunately, likelihood-based inference using these models requires $O(N^2)$\nfloating-point operations, for $N$ the number of observed cases. Recent work\nresponds to the Hawkes likelihood's computational burden by developing\nefficient graphics processing unit (GPU)-based routines that enable Bayesian\nanalysis of tens-of-thousands of observations. We build on this work and\ndevelop a high-performance computing (HPC) strategy that divides 30 Markov\nchains between 4 GPU nodes, each of which uses multiple GPUs to accelerate its\nchain's likelihood computations. We use this framework to apply two\nspatiotemporal Hawkes models to the analysis of one million COVID-19 cases in\nthe United States between March 2020 and June 2023. In addition to brute-force\nHPC, we advocate for two simple strategies as scalable alternatives to\nsuccessful approaches proposed for small data settings. First, we use known\ncounty-specific population densities to build a spatially varying triggering\nkernel in a manner that avoids computationally costly nearest neighbors search.\nSecond, we use a cut-posterior inference routine that accounts for infections'\nspatial location uncertainty by iteratively sampling latent locations uniformly\nwithin their respective counties of occurrence, thereby avoiding full-blown\nlatent variable inference for 1,000,000 infection locations.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scaling Hawkes processes to one million COVID-19 cases\",\"authors\":\"Seyoon Ko, Marc A. Suchard, Andrew J. Holbrook\",\"doi\":\"arxiv-2407.11349\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hawkes stochastic point process models have emerged as valuable statistical\\ntools for analyzing viral contagion. The spatiotemporal Hawkes process\\ncharacterizes the speeds at which viruses spread within human populations.\\nUnfortunately, likelihood-based inference using these models requires $O(N^2)$\\nfloating-point operations, for $N$ the number of observed cases. Recent work\\nresponds to the Hawkes likelihood's computational burden by developing\\nefficient graphics processing unit (GPU)-based routines that enable Bayesian\\nanalysis of tens-of-thousands of observations. We build on this work and\\ndevelop a high-performance computing (HPC) strategy that divides 30 Markov\\nchains between 4 GPU nodes, each of which uses multiple GPUs to accelerate its\\nchain's likelihood computations. We use this framework to apply two\\nspatiotemporal Hawkes models to the analysis of one million COVID-19 cases in\\nthe United States between March 2020 and June 2023. In addition to brute-force\\nHPC, we advocate for two simple strategies as scalable alternatives to\\nsuccessful approaches proposed for small data settings. First, we use known\\ncounty-specific population densities to build a spatially varying triggering\\nkernel in a manner that avoids computationally costly nearest neighbors search.\\nSecond, we use a cut-posterior inference routine that accounts for infections'\\nspatial location uncertainty by iteratively sampling latent locations uniformly\\nwithin their respective counties of occurrence, thereby avoiding full-blown\\nlatent variable inference for 1,000,000 infection locations.\",\"PeriodicalId\":501215,\"journal\":{\"name\":\"arXiv - STAT - Computation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.11349\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.11349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Scaling Hawkes processes to one million COVID-19 cases
Hawkes stochastic point process models have emerged as valuable statistical
tools for analyzing viral contagion. The spatiotemporal Hawkes process
characterizes the speeds at which viruses spread within human populations.
Unfortunately, likelihood-based inference using these models requires $O(N^2)$
floating-point operations, for $N$ the number of observed cases. Recent work
responds to the Hawkes likelihood's computational burden by developing
efficient graphics processing unit (GPU)-based routines that enable Bayesian
analysis of tens-of-thousands of observations. We build on this work and
develop a high-performance computing (HPC) strategy that divides 30 Markov
chains between 4 GPU nodes, each of which uses multiple GPUs to accelerate its
chain's likelihood computations. We use this framework to apply two
spatiotemporal Hawkes models to the analysis of one million COVID-19 cases in
the United States between March 2020 and June 2023. In addition to brute-force
HPC, we advocate for two simple strategies as scalable alternatives to
successful approaches proposed for small data settings. First, we use known
county-specific population densities to build a spatially varying triggering
kernel in a manner that avoids computationally costly nearest neighbors search.
Second, we use a cut-posterior inference routine that accounts for infections'
spatial location uncertainty by iteratively sampling latent locations uniformly
within their respective counties of occurrence, thereby avoiding full-blown
latent variable inference for 1,000,000 infection locations.