Forecasting Tuberculosis Incidence in China using Baidu Index: A Comparative Study

Xinyue Liang, Qinneng Xu, Ruo-ping Guan, Yang Zhao
{"title":"Forecasting Tuberculosis Incidence in China using Baidu Index: A Comparative Study","authors":"Xinyue Liang, Qinneng Xu, Ruo-ping Guan, Yang Zhao","doi":"10.1145/3418094.3418129","DOIUrl":null,"url":null,"abstract":"Background: Tuberculosis is a common infectious disease primarily targeting the lungs and of high morality and prevalence. Efficient prediction of tuberculosis is important to counter epidemics and successfully allocate recourse. This study's main objective is to investigate the effectiveness of using web search queries in predicting the incidence of tuberculosis in China. We conduct a comprehensive comparison on data driven methods for predicting the incidence of tuberculosis. Methods: Several data mining models are implemented in our study, including stepwise linear regression and SVM incorporating Baidu index (a recording of search queries on Baidu, the main search engine in China). The two methods are compared with traditional time series methods of autoregressive integrated moving (ARIMA) and seasonal ARIMA (SARIMA). In addition, to further investigate the reliability of prediction, the effectiveness of integrating the individual models is explored in our study, a hybrid model of SARIMA and SVM and Bayesian model averaging (BMA) are adopted to maximize the predictive utility of the models. Results and Conclusion: The experiment results show that Internet queries provide effective data sources for predicting tuberculosis, with comparable predicting ability to that of traditional time series models. It also shows that combining two or models using BMA or hybrid models can improve the prediction ability, with BMA showing by far the best results in prediction in terms of both MAPE and RSME in the 5 areas studied (Guangdong, Beijing, Tianjin and Shanghai). The findings from this study pave the way for developing accurate and timely prediction of tuberculosis cases, which is important for allocating healthcare recourses and developing strategies to counter possible future outbreaks in real practice.","PeriodicalId":192804,"journal":{"name":"Proceedings of the 4th International Conference on Medical and Health Informatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Medical and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3418094.3418129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Background: Tuberculosis is a common infectious disease primarily targeting the lungs and of high morality and prevalence. Efficient prediction of tuberculosis is important to counter epidemics and successfully allocate recourse. This study's main objective is to investigate the effectiveness of using web search queries in predicting the incidence of tuberculosis in China. We conduct a comprehensive comparison on data driven methods for predicting the incidence of tuberculosis. Methods: Several data mining models are implemented in our study, including stepwise linear regression and SVM incorporating Baidu index (a recording of search queries on Baidu, the main search engine in China). The two methods are compared with traditional time series methods of autoregressive integrated moving (ARIMA) and seasonal ARIMA (SARIMA). In addition, to further investigate the reliability of prediction, the effectiveness of integrating the individual models is explored in our study, a hybrid model of SARIMA and SVM and Bayesian model averaging (BMA) are adopted to maximize the predictive utility of the models. Results and Conclusion: The experiment results show that Internet queries provide effective data sources for predicting tuberculosis, with comparable predicting ability to that of traditional time series models. It also shows that combining two or models using BMA or hybrid models can improve the prediction ability, with BMA showing by far the best results in prediction in terms of both MAPE and RSME in the 5 areas studied (Guangdong, Beijing, Tianjin and Shanghai). The findings from this study pave the way for developing accurate and timely prediction of tuberculosis cases, which is important for allocating healthcare recourses and developing strategies to counter possible future outbreaks in real practice.
利用百度指数预测中国结核病发病率的比较研究
背景:结核病是一种以肺部为主要侵染对象的常见传染病,具有较高的道德性和患病率。结核病的有效预测对于防治流行病和成功分配资源至关重要。本研究的主要目的是调查使用网络搜索查询预测中国结核病发病率的有效性。我们对预测结核病发病率的数据驱动方法进行了全面的比较。方法:在我们的研究中实现了几种数据挖掘模型,包括逐步线性回归和包含百度索引(中国主要搜索引擎百度的搜索查询记录)的支持向量机。将这两种方法与传统的时间序列自回归积分移动(ARIMA)和季节ARIMA (SARIMA)方法进行了比较。此外,为了进一步研究预测的可靠性,本研究探讨了单个模型整合的有效性,采用SARIMA和SVM的混合模型和贝叶斯模型平均(BMA)来最大化模型的预测效用。结果与结论:实验结果表明,互联网查询为结核病预测提供了有效的数据来源,其预测能力与传统的时间序列模型相当。采用BMA或混合模型结合两种或两种模型均能提高预测能力,其中BMA在广东、北京、天津和上海5个地区的MAPE和RSME的预测效果均为最佳。这项研究的发现为开发准确和及时的结核病病例预测铺平了道路,这对于分配医疗资源和制定战略以在实际实践中应对未来可能的疫情非常重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信