提高流感样疾病预测性能的聚类-聚合-池（CAP）集成算法

IF 2.4 3区医学 Q2 INFECTIOUS DISEASES

Epidemics Pub Date : 2025-06-03 DOI:10.1016/j.epidem.2025.100832

Ningxi Wei , Xinze Zhou , Wei-Min Huang , Thomas McAndrew

{"title":"提高流感样疾病预测性能的聚类-聚合-池（CAP）集成算法","authors":"Ningxi Wei , Xinze Zhou , Wei-Min Huang , Thomas McAndrew","doi":"10.1016/j.epidem.2025.100832","DOIUrl":null,"url":null,"abstract":"<div><div>Seasonal influenza causes on average 425,000 hospitalizations and 32,000 deaths per year in the United States. Forecasts of influenza-like illness (ILI) — a surrogate for the proportion of patients infected with influenza — support public health decision making. The goal of an ensemble forecast of ILI is to increase accuracy and calibration compared to individual forecasts and to provide a single, cohesive prediction of future influenza. However, an ensemble may be composed of models that produce similar forecasts, causing issues with ensemble forecast performance and non-identifiability. To improve upon the above issues we propose a novel Cluster-Aggregate-Pool or ‘CAP’ ensemble algorithm that first groups together individual forecasts into clusters, aggregates forecasts that belong to the same cluster into a single forecast (called a cluster forecast), and then pools together cluster forecasts via a linear pool. We evaluated this algorithm on a benchmark dataset of 7 seasons of ILI plus forecasts generated by 27 individual models as part of the FluSight project. When compared to a non-CAP approach, we find that a CAP ensemble improves calibration by approximately 10% while maintaining similar accuracy to non-CAP alternatives. In addition, our CAP algorithm (i) generalizes past ensemble work associated with influenza forecasting and introduces a framework for future ensemble work, (ii) automatically accounts for missing forecasts from individual models, (iii) allows public health officials to participate in the ensemble by assigning individual models to clusters, and (iv) provide an additional signal about when peak influenza may be near.</div></div>","PeriodicalId":49206,"journal":{"name":"Epidemics","volume":"52 ","pages":"Article 100832"},"PeriodicalIF":2.4000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Cluster-Aggregate-Pool (CAP) ensemble algorithm for improved forecast performance of influenza-like illness\",\"authors\":\"Ningxi Wei , Xinze Zhou , Wei-Min Huang , Thomas McAndrew\",\"doi\":\"10.1016/j.epidem.2025.100832\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Seasonal influenza causes on average 425,000 hospitalizations and 32,000 deaths per year in the United States. Forecasts of influenza-like illness (ILI) — a surrogate for the proportion of patients infected with influenza — support public health decision making. The goal of an ensemble forecast of ILI is to increase accuracy and calibration compared to individual forecasts and to provide a single, cohesive prediction of future influenza. However, an ensemble may be composed of models that produce similar forecasts, causing issues with ensemble forecast performance and non-identifiability. To improve upon the above issues we propose a novel Cluster-Aggregate-Pool or ‘CAP’ ensemble algorithm that first groups together individual forecasts into clusters, aggregates forecasts that belong to the same cluster into a single forecast (called a cluster forecast), and then pools together cluster forecasts via a linear pool. We evaluated this algorithm on a benchmark dataset of 7 seasons of ILI plus forecasts generated by 27 individual models as part of the FluSight project. When compared to a non-CAP approach, we find that a CAP ensemble improves calibration by approximately 10% while maintaining similar accuracy to non-CAP alternatives. In addition, our CAP algorithm (i) generalizes past ensemble work associated with influenza forecasting and introduces a framework for future ensemble work, (ii) automatically accounts for missing forecasts from individual models, (iii) allows public health officials to participate in the ensemble by assigning individual models to clusters, and (iv) provide an additional signal about when peak influenza may be near.</div></div>\",\"PeriodicalId\":49206,\"journal\":{\"name\":\"Epidemics\",\"volume\":\"52 \",\"pages\":\"Article 100832\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epidemics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1755436525000209\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFECTIOUS DISEASES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1755436525000209","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}

引用次数: 0

摘要

在美国，季节性流感每年平均造成425,000人住院，32,000人死亡。流感样疾病（ILI）的预测——感染流感患者比例的替代指标——支持公共卫生决策。流感综合预报的目标是提高与单项预报相比的准确性和校准性，并提供对未来流感的单一、有凝聚力的预测。然而，一个集成可能由产生相似预测的模型组成，从而导致集成预测性能和不可识别性的问题。为了改进上述问题，我们提出了一种新的cluster - aggregate - pool或“CAP”集成算法，该算法首先将单个预测分组为集群，将属于同一集群的预测聚合为单个预测（称为集群预测），然后通过线性池将集群预测集合在一起。作为FluSight项目的一部分，我们在由27个独立模型生成的7个季节ILI和预测的基准数据集上对该算法进行了评估。与非CAP方法相比，我们发现CAP集成在保持与非CAP替代方法相似的精度的同时，将校准提高了约10%。此外，我们的CAP算法(i)概括了过去与流感预测相关的集成工作，并为未来的集成工作引入了一个框架，（ii）自动解释单个模型的缺失预测，（iii）允许公共卫生官员通过将单个模型分配给集群来参与集成，以及（iv）提供关于流感高峰何时可能接近的额外信号。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Cluster-Aggregate-Pool (CAP) ensemble algorithm for improved forecast performance of influenza-like illness

Seasonal influenza causes on average 425,000 hospitalizations and 32,000 deaths per year in the United States. Forecasts of influenza-like illness (ILI) — a surrogate for the proportion of patients infected with influenza — support public health decision making. The goal of an ensemble forecast of ILI is to increase accuracy and calibration compared to individual forecasts and to provide a single, cohesive prediction of future influenza. However, an ensemble may be composed of models that produce similar forecasts, causing issues with ensemble forecast performance and non-identifiability. To improve upon the above issues we propose a novel Cluster-Aggregate-Pool or ‘CAP’ ensemble algorithm that first groups together individual forecasts into clusters, aggregates forecasts that belong to the same cluster into a single forecast (called a cluster forecast), and then pools together cluster forecasts via a linear pool. We evaluated this algorithm on a benchmark dataset of 7 seasons of ILI plus forecasts generated by 27 individual models as part of the FluSight project. When compared to a non-CAP approach, we find that a CAP ensemble improves calibration by approximately 10% while maintaining similar accuracy to non-CAP alternatives. In addition, our CAP algorithm (i) generalizes past ensemble work associated with influenza forecasting and introduces a framework for future ensemble work, (ii) automatically accounts for missing forecasts from individual models, (iii) allows public health officials to participate in the ensemble by assigning individual models to clusters, and (iv) provide an additional signal about when peak influenza may be near.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Epidemics INFECTIOUS DISEASES-

CiteScore

6.00

自引率

7.90%

发文量

审稿时长

140 days

期刊介绍： Epidemics publishes papers on infectious disease dynamics in the broadest sense. Its scope covers both within-host dynamics of infectious agents and dynamics at the population level, particularly the interaction between the two. Areas of emphasis include: spread, transmission, persistence, implications and population dynamics of infectious diseases; population and public health as well as policy aspects of control and prevention; dynamics at the individual level; interaction with the environment, ecology and evolution of infectious diseases, as well as population genetics of infectious agents.