{"title":"ChatGPT-Assisted Deep Learning Models for Influenza-Like Illness Prediction in Mainland China: Time Series Analysis.","authors":"Weihong Huang, Wudi Wei, Xiaotao He, Baili Zhan, Xiaoting Xie, Meng Zhang, Shiyi Lai, Zongxiang Yuan, Jingzhen Lai, Rongfeng Chen, Junjun Jiang, Li Ye, Hao Liang","doi":"10.2196/74423","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Influenza in mainland China results in a large number of outpatient and emergency visits related to influenza-like illness (ILI) annually. While deep learning models show promise for improving influenza forecasting, their technical complexity remains a barrier to practical implementation. Large language models, such as ChatGPT, offer the potential to reduce these barriers by supporting automated code generation, debugging, and model optimization.</p><p><strong>Objective: </strong>This study aimed to evaluate the predictive performance of several deep learning models for ILI positive rates in mainland China and to explore the auxiliary role of ChatGPT-assisted development in facilitating model implementation.</p><p><strong>Methods: </strong>ILI positivity rate data spanning from 2014 to 2024 were obtained from the Chinese National Influenza Center (CNIC) database. In total, 5 deep learning architectures-long short-term memory (LSTM), neural basis expansion analysis for time series (N-BEATS), transformer, temporal fusion transformer (TFT), and time-series dense encoder (TiDE)-were developed using a ChatGPT-assisted workflow covering code generation, error debugging, and performance optimization. Models were trained on data from 2014 to 2023 and tested on holdout data from 2024 (weeks 1-39). Performance was evaluated using mean squared error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).</p><p><strong>Results: </strong>ILI trends exhibited clear seasonal patterns with winter peaks and summer troughs, alongside marked fluctuations during the COVID-19 pandemic period (2020-2022). All 5 deep learning models were successfully constructed, debugged, and optimized with the assistance of ChatGPT. Among the 5 models, TiDE achieved the best predictive performance nationally (MAE=5.551, MSE=43.976, MAPE=72.413%) and in the southern region (MAE=7.554, MSE=89.708, MAPE=74.475%). In the northern region, where forecasting proved more challenging, TiDE still performed best (MAE=4.131, MSE=28.922), although high percentage errors remained (MAPE>400%). N-BEATS demonstrated the second-best performance nationally (MAE=9.423) and showed greater stability in the north (MAE=6.325). In contrast, transformer and TFT consistently underperformed, with national MAE values of 10.613 and 12.538, respectively. TFT exhibited the highest deviation (national MAPE=169.29%). Extreme regional disparities were observed, particularly in northern China, where LSTM and TFT generated MAPE values exceeding 1918%, despite LSTM's moderate performance in the south (MAE=9.460).</p><p><strong>Conclusions: </strong>Deep learning models, particularly TiDE, demonstrate strong potential for accurate ILI forecasting across diverse regions of China. Furthermore, large language models like ChatGPT can substantially enhance modeling efficiency and accessibility by assisting nontechnical users in model development. These findings support the integration of AI-assisted workflows into epidemic prediction systems as a scalable approach for improving public health preparedness.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e74423"},"PeriodicalIF":5.8000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12227151/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/74423","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Influenza in mainland China results in a large number of outpatient and emergency visits related to influenza-like illness (ILI) annually. While deep learning models show promise for improving influenza forecasting, their technical complexity remains a barrier to practical implementation. Large language models, such as ChatGPT, offer the potential to reduce these barriers by supporting automated code generation, debugging, and model optimization.
Objective: This study aimed to evaluate the predictive performance of several deep learning models for ILI positive rates in mainland China and to explore the auxiliary role of ChatGPT-assisted development in facilitating model implementation.
Methods: ILI positivity rate data spanning from 2014 to 2024 were obtained from the Chinese National Influenza Center (CNIC) database. In total, 5 deep learning architectures-long short-term memory (LSTM), neural basis expansion analysis for time series (N-BEATS), transformer, temporal fusion transformer (TFT), and time-series dense encoder (TiDE)-were developed using a ChatGPT-assisted workflow covering code generation, error debugging, and performance optimization. Models were trained on data from 2014 to 2023 and tested on holdout data from 2024 (weeks 1-39). Performance was evaluated using mean squared error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).
Results: ILI trends exhibited clear seasonal patterns with winter peaks and summer troughs, alongside marked fluctuations during the COVID-19 pandemic period (2020-2022). All 5 deep learning models were successfully constructed, debugged, and optimized with the assistance of ChatGPT. Among the 5 models, TiDE achieved the best predictive performance nationally (MAE=5.551, MSE=43.976, MAPE=72.413%) and in the southern region (MAE=7.554, MSE=89.708, MAPE=74.475%). In the northern region, where forecasting proved more challenging, TiDE still performed best (MAE=4.131, MSE=28.922), although high percentage errors remained (MAPE>400%). N-BEATS demonstrated the second-best performance nationally (MAE=9.423) and showed greater stability in the north (MAE=6.325). In contrast, transformer and TFT consistently underperformed, with national MAE values of 10.613 and 12.538, respectively. TFT exhibited the highest deviation (national MAPE=169.29%). Extreme regional disparities were observed, particularly in northern China, where LSTM and TFT generated MAPE values exceeding 1918%, despite LSTM's moderate performance in the south (MAE=9.460).
Conclusions: Deep learning models, particularly TiDE, demonstrate strong potential for accurate ILI forecasting across diverse regions of China. Furthermore, large language models like ChatGPT can substantially enhance modeling efficiency and accessibility by assisting nontechnical users in model development. These findings support the integration of AI-assisted workflows into epidemic prediction systems as a scalable approach for improving public health preparedness.
期刊介绍:
The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades.
As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor.
Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.