Advancing real-time infectious disease forecasting using large language models

IF 18.3 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science Pub Date : 2025-06-06 DOI:10.1038/s43588-025-00798-6

Hongru Du, Yang Zhao, Jianan Zhao, Shaochong Xu, Xihong Lin, Yiran Chen, Lauren M. Gardner, Hao ‘Frank’ Yang

{"title":"Advancing real-time infectious disease forecasting using large language models","authors":"Hongru Du, Yang Zhao, Jianan Zhao, Shaochong Xu, Xihong Lin, Yiran Chen, Lauren M. Gardner, Hao ‘Frank’ Yang","doi":"10.1038/s43588-025-00798-6","DOIUrl":null,"url":null,"abstract":"Forecasting the short-term spread of an ongoing disease outbreak poses a challenge owing to the complexity of contributing factors, some of which can be characterized through interlinked, multi-modality variables, and the intersection of public policy and human behavior. Here we introduce PandemicLLM, a framework with multi-modal large language models (LLMs) that reformulates real-time forecasting of disease spread as a text-reasoning problem, with the ability to incorporate real-time, complex, non-numerical information. This approach, through an artificial intelligence–human cooperative prompt design and time-series representation learning, encodes multi-modal data for LLMs. The model is applied to the COVID-19 pandemic, and trained to utilize textual public health policies, genomic surveillance, spatial and epidemiological time-series data, and is tested across all 50 states of the United States for a duration of 19 months. PandemicLLM opens avenues for incorporating various pandemic-related data in heterogeneous formats and shows performance benefits over existing models. PandemicLLM adapts the large language model to predict disease trends by converting diverse disease-relevant data into text. It responds to new variants in real time, offering robust, interpretable forecasts for effective public health responses.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 6","pages":"467-480"},"PeriodicalIF":18.3000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43588-025-00798-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Forecasting the short-term spread of an ongoing disease outbreak poses a challenge owing to the complexity of contributing factors, some of which can be characterized through interlinked, multi-modality variables, and the intersection of public policy and human behavior. Here we introduce PandemicLLM, a framework with multi-modal large language models (LLMs) that reformulates real-time forecasting of disease spread as a text-reasoning problem, with the ability to incorporate real-time, complex, non-numerical information. This approach, through an artificial intelligence–human cooperative prompt design and time-series representation learning, encodes multi-modal data for LLMs. The model is applied to the COVID-19 pandemic, and trained to utilize textual public health policies, genomic surveillance, spatial and epidemiological time-series data, and is tested across all 50 states of the United States for a duration of 19 months. PandemicLLM opens avenues for incorporating various pandemic-related data in heterogeneous formats and shows performance benefits over existing models. PandemicLLM adapts the large language model to predict disease trends by converting diverse disease-relevant data into text. It responds to new variants in real time, offering robust, interpretable forecasts for effective public health responses.

Abstract Image

查看原文本刊更多论文

使用大型语言模型推进实时传染病预测。

预测正在发生的疾病爆发的短期传播是一项挑战，因为促成因素复杂，其中一些因素可以通过相互关联的多模态变量以及公共政策和人类行为的交集来表征。在这里，我们介绍了PandemicLLM，这是一个具有多模态大语言模型（llm）的框架，它将疾病传播的实时预测重新制定为文本推理问题，具有整合实时，复杂，非数字信息的能力。该方法通过人工智能-人类合作提示设计和时间序列表示学习，对llm的多模态数据进行编码。该模型应用于2019冠状病毒病大流行，并经过培训，可以利用文本公共卫生政策、基因组监测、空间和流行病学时间序列数据，并在美国所有50个州进行了为期19个月的测试。PandemicLLM为以异构格式合并各种与流行病有关的数据开辟了途径，并显示出优于现有模型的性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nature computational science

CiteScore

11.70

自引率

0.00%

发文量