用机器学习进行两阶段中断时间序列分析:以评估2018年旧金山县野火烟雾事件对健康的影响为例研究

IF 5 2区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Arnab K Dey, Yiqun Ma, Gabriel Carrasco-Escobar, Changwoo Han, François Rerolle, Tarik Benmarhnia
{"title":"用机器学习进行两阶段中断时间序列分析:以评估2018年旧金山县野火烟雾事件对健康的影响为例研究","authors":"Arnab K Dey, Yiqun Ma, Gabriel Carrasco-Escobar, Changwoo Han, François Rerolle, Tarik Benmarhnia","doi":"10.1093/aje/kwaf147","DOIUrl":null,"url":null,"abstract":"<p><p>Randomized controlled trials (RCTs) are considered a key identification strategy for establishing causal relationships between exposures and outcomes. When evaluating the health impacts of extreme weather events, however, RCTs are generally infeasible due to ethical issues, costs, and the lack of a suitable control group. Quasi-experimental designs capitalizing on the timing of natural experiments, such as Interrupted Time Series (ITS), offer a valuable alternative to estimate causal effects when control groups are not available. This paper explores the application of a two-stage ITS framework that compares traditional autoregressive integrated moving average (ARIMA) models and two machine learning algorithms: Neural Network Autoregressive (NNETAR) and Prophet-Extreme Gradient Boosting (XGBoost). As a case study, we assess the impacts of the 2018 wildfire smoke event on respiratory hospitalizations in San Francisco County, California. We split the data into pre- and post-event periods to train and evaluate the models, perform cross-validation for hyperparameter tuning, and predict hospitalizations under the counterfactual scenario. Data and R code are provided for reproducibility. In the case study, the Prophet-XGBoost shows the best model performance and was used to generate the counterfactual trends. We estimate that the 2018 smoke event resulted in a total of 92 (95% empirical confidence interval: 24, 125) excess respiratory hospitalizations (12.5% of the observed hospitalization count during the event period). Our proposed approach offers a powerful tool for assessing the effects of extreme weather events and can be broadly applied to other epidemiological contexts, such as public health policy evaluation.</p>","PeriodicalId":7472,"journal":{"name":"American journal of epidemiology","volume":" ","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Two-Stage Interrupted Time Series Analysis with Machine Learning: Evaluating the Health Effects of the 2018 Wildfire Smoke Event in San Francisco County as a Case Study.\",\"authors\":\"Arnab K Dey, Yiqun Ma, Gabriel Carrasco-Escobar, Changwoo Han, François Rerolle, Tarik Benmarhnia\",\"doi\":\"10.1093/aje/kwaf147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Randomized controlled trials (RCTs) are considered a key identification strategy for establishing causal relationships between exposures and outcomes. When evaluating the health impacts of extreme weather events, however, RCTs are generally infeasible due to ethical issues, costs, and the lack of a suitable control group. Quasi-experimental designs capitalizing on the timing of natural experiments, such as Interrupted Time Series (ITS), offer a valuable alternative to estimate causal effects when control groups are not available. This paper explores the application of a two-stage ITS framework that compares traditional autoregressive integrated moving average (ARIMA) models and two machine learning algorithms: Neural Network Autoregressive (NNETAR) and Prophet-Extreme Gradient Boosting (XGBoost). As a case study, we assess the impacts of the 2018 wildfire smoke event on respiratory hospitalizations in San Francisco County, California. We split the data into pre- and post-event periods to train and evaluate the models, perform cross-validation for hyperparameter tuning, and predict hospitalizations under the counterfactual scenario. Data and R code are provided for reproducibility. In the case study, the Prophet-XGBoost shows the best model performance and was used to generate the counterfactual trends. We estimate that the 2018 smoke event resulted in a total of 92 (95% empirical confidence interval: 24, 125) excess respiratory hospitalizations (12.5% of the observed hospitalization count during the event period). Our proposed approach offers a powerful tool for assessing the effects of extreme weather events and can be broadly applied to other epidemiological contexts, such as public health policy evaluation.</p>\",\"PeriodicalId\":7472,\"journal\":{\"name\":\"American journal of epidemiology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/aje/kwaf147\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/aje/kwaf147","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

随机对照试验(rct)被认为是建立暴露与结果之间因果关系的关键识别策略。然而,在评估极端天气事件对健康的影响时,由于伦理问题、成本和缺乏合适的对照组,随机对照试验通常是不可行的。准实验设计利用自然实验的时间,如中断时间序列(ITS),在没有对照组的情况下,为估计因果效应提供了有价值的替代方法。本文探讨了两阶段ITS框架的应用,该框架比较了传统的自回归综合移动平均(ARIMA)模型和两种机器学习算法:神经网络自回归(NNETAR)和预言极端梯度增强(XGBoost)。作为案例研究,我们评估了2018年野火烟雾事件对加利福尼亚州旧金山县呼吸系统住院治疗的影响。我们将数据分为事件前和事件后两个阶段,以训练和评估模型,对超参数调整进行交叉验证,并预测反事实情景下的住院情况。数据和R代码提供了再现性。在案例研究中,Prophet-XGBoost显示出最佳的模型性能,并用于生成反事实趋势。我们估计,2018年烟雾事件共导致92例(95%经验置信区间:24,125)额外呼吸道住院(占事件期间观察到的住院人数的12.5%)。我们提出的方法为评估极端天气事件的影响提供了一个强有力的工具,并且可以广泛应用于其他流行病学背景,例如公共卫生政策评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Two-Stage Interrupted Time Series Analysis with Machine Learning: Evaluating the Health Effects of the 2018 Wildfire Smoke Event in San Francisco County as a Case Study.

Randomized controlled trials (RCTs) are considered a key identification strategy for establishing causal relationships between exposures and outcomes. When evaluating the health impacts of extreme weather events, however, RCTs are generally infeasible due to ethical issues, costs, and the lack of a suitable control group. Quasi-experimental designs capitalizing on the timing of natural experiments, such as Interrupted Time Series (ITS), offer a valuable alternative to estimate causal effects when control groups are not available. This paper explores the application of a two-stage ITS framework that compares traditional autoregressive integrated moving average (ARIMA) models and two machine learning algorithms: Neural Network Autoregressive (NNETAR) and Prophet-Extreme Gradient Boosting (XGBoost). As a case study, we assess the impacts of the 2018 wildfire smoke event on respiratory hospitalizations in San Francisco County, California. We split the data into pre- and post-event periods to train and evaluate the models, perform cross-validation for hyperparameter tuning, and predict hospitalizations under the counterfactual scenario. Data and R code are provided for reproducibility. In the case study, the Prophet-XGBoost shows the best model performance and was used to generate the counterfactual trends. We estimate that the 2018 smoke event resulted in a total of 92 (95% empirical confidence interval: 24, 125) excess respiratory hospitalizations (12.5% of the observed hospitalization count during the event period). Our proposed approach offers a powerful tool for assessing the effects of extreme weather events and can be broadly applied to other epidemiological contexts, such as public health policy evaluation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
American journal of epidemiology
American journal of epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
7.40
自引率
4.00%
发文量
221
审稿时长
3-6 weeks
期刊介绍: The American Journal of Epidemiology is the oldest and one of the premier epidemiologic journals devoted to the publication of empirical research findings, opinion pieces, and methodological developments in the field of epidemiologic research. It is a peer-reviewed journal aimed at both fellow epidemiologists and those who use epidemiologic data, including public health workers and clinicians.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信