From Policy to Prediction: Assessing Forecasting Accuracy in an Integrated Framework with Machine Learning and Disease Models.

IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS
Journal of Computational Biology Pub Date : 2024-11-01 Epub Date: 2024-08-02 DOI:10.1089/cmb.2023.0377
Amit K Chakraborty, Hao Wang, Pouria Ramazi
{"title":"From Policy to Prediction: Assessing Forecasting Accuracy in an Integrated Framework with Machine Learning and Disease Models.","authors":"Amit K Chakraborty, Hao Wang, Pouria Ramazi","doi":"10.1089/cmb.2023.0377","DOIUrl":null,"url":null,"abstract":"<p><p>To improve the forecasting accuracy of the spread of infectious diseases, a hybrid model was recently introduced where the commonly assumed constant disease transmission rate was actively estimated from enforced mitigating policy data by a machine learning (ML) model and then fed to an extended susceptible-infected-recovered model to forecast the number of infected cases. Testing only one ML model, that is, gradient boosting model (GBM), the work left open whether other ML models would perform better. Here, we compared GBMs, linear regressions, k-nearest neighbors, and Bayesian networks (BNs) in forecasting the number of COVID-19-infected cases in the United States and Canadian provinces based on policy indices of future 35 days. There was no significant difference in the mean absolute percentage errors of these ML models over the combined dataset [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>3.10</mn><mo>,</mo><mi>p</mi><mo>=</mo><mn>0.38</mn></mrow></math>]. In two provinces, a significant difference was observed [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.77</mn><mo>,</mo><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.07</mn><mo>,</mo><mi>p</mi><mo><</mo><mn>0.05</mn></mrow></math>], yet posthoc tests revealed no significant difference in pairwise comparisons. Nevertheless, BNs significantly outperformed the other models in most of the training datasets. The results put forward that the ML models have equal forecasting power overall, and BNs are best for data-fitting applications.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1104-1117"},"PeriodicalIF":1.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2023.0377","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/2 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

To improve the forecasting accuracy of the spread of infectious diseases, a hybrid model was recently introduced where the commonly assumed constant disease transmission rate was actively estimated from enforced mitigating policy data by a machine learning (ML) model and then fed to an extended susceptible-infected-recovered model to forecast the number of infected cases. Testing only one ML model, that is, gradient boosting model (GBM), the work left open whether other ML models would perform better. Here, we compared GBMs, linear regressions, k-nearest neighbors, and Bayesian networks (BNs) in forecasting the number of COVID-19-infected cases in the United States and Canadian provinces based on policy indices of future 35 days. There was no significant difference in the mean absolute percentage errors of these ML models over the combined dataset [H(3)=3.10,p=0.38]. In two provinces, a significant difference was observed [H(3)=8.77,H(3)=8.07,p<0.05], yet posthoc tests revealed no significant difference in pairwise comparisons. Nevertheless, BNs significantly outperformed the other models in most of the training datasets. The results put forward that the ML models have equal forecasting power overall, and BNs are best for data-fitting applications.

从政策到预测:利用机器学习和疾病模型评估综合框架中的预测准确性。
为了提高传染病传播预测的准确性,最近推出了一种混合模型,即通过机器学习(ML)模型从强制减灾政策数据中主动估计通常假定的恒定疾病传播率,然后将其输入扩展的易感-感染-恢复模型,以预测感染病例的数量。这项工作只测试了一种 ML 模型,即梯度提升模型(GBM),其他 ML 模型是否会有更好的表现尚无定论。在此,我们根据未来 35 天的政策指数,比较了 GBM、线性回归、k 最近邻和贝叶斯网络 (BN) 在预测美国和加拿大各省 COVID-19 感染病例数方面的表现。在综合数据集上,这些 ML 模型的平均绝对百分比误差没有明显差异[H(3)=3.10,p=0.38]。在两个省份,观察到了显著差异[H(3)=8.77,H(3)=8.07,p0.05],但事后检验显示配对比较无显著差异。不过,在大多数训练数据集中,BNs 的表现明显优于其他模型。结果表明,ML 模型总体上具有相同的预测能力,而 BN 最适合数据拟合应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Computational Biology
Journal of Computational Biology 生物-计算机:跨学科应用
CiteScore
3.60
自引率
5.90%
发文量
113
审稿时长
6-12 weeks
期刊介绍: Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics. Journal of Computational Biology coverage includes: -Genomics -Mathematical modeling and simulation -Distributed and parallel biological computing -Designing biological databases -Pattern matching and pattern detection -Linking disparate databases and data -New tools for computational biology -Relational and object-oriented database technology for bioinformatics -Biological expert system design and use -Reasoning by analogy, hypothesis formation, and testing by machine -Management of biological databases
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信