fluusion:整合多个数据源以实现准确的流感预测。

IF 3 3区 医学 Q2 INFECTIOUS DISEASES
Evan L. Ray , Yijin Wang , Russell D. Wolfinger , Nicholas G. Reich
{"title":"fluusion:整合多个数据源以实现准确的流感预测。","authors":"Evan L. Ray ,&nbsp;Yijin Wang ,&nbsp;Russell D. Wolfinger ,&nbsp;Nicholas G. Reich","doi":"10.1016/j.epidem.2024.100810","DOIUrl":null,"url":null,"abstract":"<div><div>Over the last ten years, the US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge with the motivation that accurate probabilistic forecasts could improve situational awareness and yield more effective public health actions. Starting with the 2021/22 influenza season, the forecasting targets for this challenge have been based on hospital admissions reported in the CDC’s National Healthcare Safety Network (NHSN) surveillance system. Reporting of influenza hospital admissions through NHSN began within the last few years, and as such only a limited amount of historical data are available for this target signal. To produce forecasts in the presence of limited data for the target surveillance system, we augmented these data with two signals that have a longer historical record: 1) ILI+, which estimates the proportion of outpatient doctor visits where the patient has influenza; and 2) rates of laboratory-confirmed influenza hospitalizations at a selected set of healthcare facilities. Our model, Flusion, is an ensemble model that combines two machine learning models using gradient boosting for quantile regression based on different feature sets with a Bayesian autoregressive model. The gradient boosting models were trained on all three data signals, while the autoregressive model was trained on only data for the target surveillance signal, NHSN admissions; all three models were trained jointly on data for multiple locations. In each week of the influenza season, these models produced quantiles of a predictive distribution of influenza hospital admissions in each state for the current week and the following three weeks; the ensemble prediction was computed by averaging these quantile predictions. Flusion emerged as the top-performing model in the CDC’s influenza prediction challenge for the 2023/24 season. In this article we investigate the factors contributing to Flusion’s success, and we find that its strong performance was primarily driven by the use of a gradient boosting model that was trained jointly on data from multiple surveillance signals and multiple locations. These results indicate the value of sharing information across multiple locations and surveillance signals, especially when doing so adds to the pool of available training data.</div></div>","PeriodicalId":49206,"journal":{"name":"Epidemics","volume":"50 ","pages":"Article 100810"},"PeriodicalIF":3.0000,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Flusion: Integrating multiple data sources for accurate influenza predictions\",\"authors\":\"Evan L. Ray ,&nbsp;Yijin Wang ,&nbsp;Russell D. Wolfinger ,&nbsp;Nicholas G. Reich\",\"doi\":\"10.1016/j.epidem.2024.100810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Over the last ten years, the US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge with the motivation that accurate probabilistic forecasts could improve situational awareness and yield more effective public health actions. Starting with the 2021/22 influenza season, the forecasting targets for this challenge have been based on hospital admissions reported in the CDC’s National Healthcare Safety Network (NHSN) surveillance system. Reporting of influenza hospital admissions through NHSN began within the last few years, and as such only a limited amount of historical data are available for this target signal. To produce forecasts in the presence of limited data for the target surveillance system, we augmented these data with two signals that have a longer historical record: 1) ILI+, which estimates the proportion of outpatient doctor visits where the patient has influenza; and 2) rates of laboratory-confirmed influenza hospitalizations at a selected set of healthcare facilities. Our model, Flusion, is an ensemble model that combines two machine learning models using gradient boosting for quantile regression based on different feature sets with a Bayesian autoregressive model. The gradient boosting models were trained on all three data signals, while the autoregressive model was trained on only data for the target surveillance signal, NHSN admissions; all three models were trained jointly on data for multiple locations. In each week of the influenza season, these models produced quantiles of a predictive distribution of influenza hospital admissions in each state for the current week and the following three weeks; the ensemble prediction was computed by averaging these quantile predictions. Flusion emerged as the top-performing model in the CDC’s influenza prediction challenge for the 2023/24 season. In this article we investigate the factors contributing to Flusion’s success, and we find that its strong performance was primarily driven by the use of a gradient boosting model that was trained jointly on data from multiple surveillance signals and multiple locations. These results indicate the value of sharing information across multiple locations and surveillance signals, especially when doing so adds to the pool of available training data.</div></div>\",\"PeriodicalId\":49206,\"journal\":{\"name\":\"Epidemics\",\"volume\":\"50 \",\"pages\":\"Article 100810\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-12-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epidemics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1755436524000719\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFECTIOUS DISEASES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1755436524000719","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0

摘要

在过去十年中,美国疾病控制和预防中心(CDC)组织了一年一度的流感预测挑战,其动机是准确的概率预测可以提高态势意识,并产生更有效的公共卫生行动。从2021/22年流感季节开始,这一挑战的预测目标是基于疾病预防控制中心国家卫生保健安全网(NHSN)监测系统报告的住院情况。在过去几年中,通过国家卫生保健网络开始报告流感住院情况,因此只有有限数量的历史数据可用于这一目标信号。为了在目标监测系统数据有限的情况下做出预测,我们用两个具有较长历史记录的信号来增强这些数据:1)ILI+,它估计患者患流感的门诊医生就诊比例;2)在选定的一组卫生保健机构中经实验室确诊的流感住院率。我们的模型fluusion是一个集成模型,它结合了两个机器学习模型,使用梯度增强进行基于不同特征集和贝叶斯自回归模型的分位数回归。梯度增强模型在所有三个数据信号上进行训练,而自回归模型仅在目标监视信号(NHSN录取)的数据上进行训练;所有三个模型都是在多个地点的数据上进行联合训练的。在流感季节的每一周,这些模型产生了当周和接下来三周内每个州流感住院人数的预测分布的分位数;集合预测是通过平均这些分位数预测来计算的。在美国疾病控制与预防中心的2023/24年流感预测挑战赛中,fluusion成为表现最好的模型。在本文中,我们研究了促成fluusion成功的因素,我们发现其强大的性能主要是由使用梯度增强模型驱动的,该模型是根据来自多个监视信号和多个位置的数据联合训练的。这些结果表明跨多个位置和监视信号共享信息的价值,特别是当这样做增加了可用的训练数据池时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Flusion: Integrating multiple data sources for accurate influenza predictions
Over the last ten years, the US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge with the motivation that accurate probabilistic forecasts could improve situational awareness and yield more effective public health actions. Starting with the 2021/22 influenza season, the forecasting targets for this challenge have been based on hospital admissions reported in the CDC’s National Healthcare Safety Network (NHSN) surveillance system. Reporting of influenza hospital admissions through NHSN began within the last few years, and as such only a limited amount of historical data are available for this target signal. To produce forecasts in the presence of limited data for the target surveillance system, we augmented these data with two signals that have a longer historical record: 1) ILI+, which estimates the proportion of outpatient doctor visits where the patient has influenza; and 2) rates of laboratory-confirmed influenza hospitalizations at a selected set of healthcare facilities. Our model, Flusion, is an ensemble model that combines two machine learning models using gradient boosting for quantile regression based on different feature sets with a Bayesian autoregressive model. The gradient boosting models were trained on all three data signals, while the autoregressive model was trained on only data for the target surveillance signal, NHSN admissions; all three models were trained jointly on data for multiple locations. In each week of the influenza season, these models produced quantiles of a predictive distribution of influenza hospital admissions in each state for the current week and the following three weeks; the ensemble prediction was computed by averaging these quantile predictions. Flusion emerged as the top-performing model in the CDC’s influenza prediction challenge for the 2023/24 season. In this article we investigate the factors contributing to Flusion’s success, and we find that its strong performance was primarily driven by the use of a gradient boosting model that was trained jointly on data from multiple surveillance signals and multiple locations. These results indicate the value of sharing information across multiple locations and surveillance signals, especially when doing so adds to the pool of available training data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Epidemics
Epidemics INFECTIOUS DISEASES-
CiteScore
6.00
自引率
7.90%
发文量
92
审稿时长
140 days
期刊介绍: Epidemics publishes papers on infectious disease dynamics in the broadest sense. Its scope covers both within-host dynamics of infectious agents and dynamics at the population level, particularly the interaction between the two. Areas of emphasis include: spread, transmission, persistence, implications and population dynamics of infectious diseases; population and public health as well as policy aspects of control and prevention; dynamics at the individual level; interaction with the environment, ecology and evolution of infectious diseases, as well as population genetics of infectious agents.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信