From Policy to Prediction: Assessing Forecasting Accuracy in an Integrated Framework with Machine Learning and Disease Models.

IF 1.4 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology Pub Date : 2024-11-01 Epub Date: 2024-08-02 DOI:10.1089/cmb.2023.0377

Amit K Chakraborty, Hao Wang, Pouria Ramazi

{"title":"From Policy to Prediction: Assessing Forecasting Accuracy in an Integrated Framework with Machine Learning and Disease Models.","authors":"Amit K Chakraborty, Hao Wang, Pouria Ramazi","doi":"10.1089/cmb.2023.0377","DOIUrl":null,"url":null,"abstract":"<p><p>To improve the forecasting accuracy of the spread of infectious diseases, a hybrid model was recently introduced where the commonly assumed constant disease transmission rate was actively estimated from enforced mitigating policy data by a machine learning (ML) model and then fed to an extended susceptible-infected-recovered model to forecast the number of infected cases. Testing only one ML model, that is, gradient boosting model (GBM), the work left open whether other ML models would perform better. Here, we compared GBMs, linear regressions, k-nearest neighbors, and Bayesian networks (BNs) in forecasting the number of COVID-19-infected cases in the United States and Canadian provinces based on policy indices of future 35 days. There was no significant difference in the mean absolute percentage errors of these ML models over the combined dataset [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>3.10</mn><mo>,</mo><mi>p</mi><mo>=</mo><mn>0.38</mn></mrow></math>]. In two provinces, a significant difference was observed [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.77</mn><mo>,</mo><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.07</mn><mo>,</mo><mi>p</mi><mo><</mo><mn>0.05</mn></mrow></math>], yet posthoc tests revealed no significant difference in pairwise comparisons. Nevertheless, BNs significantly outperformed the other models in most of the training datasets. The results put forward that the ML models have equal forecasting power overall, and BNs are best for data-fitting applications.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1104-1117"},"PeriodicalIF":1.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2023.0377","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/2 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

To improve the forecasting accuracy of the spread of infectious diseases, a hybrid model was recently introduced where the commonly assumed constant disease transmission rate was actively estimated from enforced mitigating policy data by a machine learning (ML) model and then fed to an extended susceptible-infected-recovered model to forecast the number of infected cases. Testing only one ML model, that is, gradient boosting model (GBM), the work left open whether other ML models would perform better. Here, we compared GBMs, linear regressions, k-nearest neighbors, and Bayesian networks (BNs) in forecasting the number of COVID-19-infected cases in the United States and Canadian provinces based on policy indices of future 35 days. There was no significant difference in the mean absolute percentage errors of these ML models over the combined dataset [ $H (3) = 3.10, p = 0.38$ ]. In two provinces, a significant difference was observed [ $H (3) = 8.77, H (3) = 8.07, p < 0.05$ ], yet posthoc tests revealed no significant difference in pairwise comparisons. Nevertheless, BNs significantly outperformed the other models in most of the training datasets. The results put forward that the ML models have equal forecasting power overall, and BNs are best for data-fitting applications.

查看原文本刊更多论文

从政策到预测：利用机器学习和疾病模型评估综合框架中的预测准确性。

为了提高传染病传播预测的准确性，最近推出了一种混合模型，即通过机器学习（ML）模型从强制减灾政策数据中主动估计通常假定的恒定疾病传播率，然后将其输入扩展的易感-感染-恢复模型，以预测感染病例的数量。这项工作只测试了一种 ML 模型，即梯度提升模型（GBM），其他 ML 模型是否会有更好的表现尚无定论。在此，我们根据未来 35 天的政策指数，比较了 GBM、线性回归、k 最近邻和贝叶斯网络 (BN) 在预测美国和加拿大各省 COVID-19 感染病例数方面的表现。在综合数据集上，这些 ML 模型的平均绝对百分比误差没有明显差异[H(3)=3.10,p=0.38]。在两个省份，观察到了显著差异[H(3)=8.77,H(3)=8.07,p0.05]，但事后检验显示配对比较无显著差异。不过，在大多数训练数据集中，BNs 的表现明显优于其他模型。结果表明，ML 模型总体上具有相同的预测能力，而 BN 最适合数据拟合应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computational Biology 生物-计算机：跨学科应用

CiteScore

3.60

自引率

5.90%

发文量

113

审稿时长

6-12 weeks

期刊介绍： Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics. Journal of Computational Biology coverage includes: -Genomics -Mathematical modeling and simulation -Distributed and parallel biological computing -Designing biological databases -Pattern matching and pattern detection -Linking disparate databases and data -New tools for computational biology -Relational and object-oriented database technology for bioinformatics -Biological expert system design and use -Reasoning by analogy, hypothesis formation, and testing by machine -Management of biological databases