J. V. Bogado, D. Stalder, C. Schaerer, Santiago Gómez-Guerrero
{"title":"Time Series Clustering to Improve Dengue Cases Forecasting with Deep Learning","authors":"J. V. Bogado, D. Stalder, C. Schaerer, Santiago Gómez-Guerrero","doi":"10.1109/CLEI53233.2021.9640130","DOIUrl":null,"url":null,"abstract":"Dengue fever represents a public health problem and accurate forecasts can help governments take the best preventive actions. As the volume of data provided continuously increases, machine learning and deep learning (DL) models have become an attractive approach. However, it is difficult to perform accurate predictions in areas with fewer cases. In this work, we compare traditional approaches such as LASSO Regression (LR), Random Forest (RF), Support Vector Regression (SVR) vs DL models based on long short-term memory (LSTM), considering weekly dengue incidence and climate, in 217 cities in Paraguay. Several city models may present heterogeneous behaviors and poor accuracy. To mitigate this problem, a clustering analysis between time series is performed based on silhouette scores and measuring how well an observation is clustered. Our results indicate the hierarchical clustering combined with Spearman correlation is the most appropriate approach. Then several LSTM models are compared on subgroups of similar time series. The root mean squared error (RMSE) confirms that the LSTM clustered models improve the accuracy by 31.6% approximately. The main contribution of this work is that LSTM clustered models can perform predictions in cities with low incidence by combining information from similar time-series and weather data.","PeriodicalId":6803,"journal":{"name":"2021 XLVII Latin American Computing Conference (CLEI)","volume":"101 1","pages":"1-10"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 XLVII Latin American Computing Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI53233.2021.9640130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Dengue fever represents a public health problem and accurate forecasts can help governments take the best preventive actions. As the volume of data provided continuously increases, machine learning and deep learning (DL) models have become an attractive approach. However, it is difficult to perform accurate predictions in areas with fewer cases. In this work, we compare traditional approaches such as LASSO Regression (LR), Random Forest (RF), Support Vector Regression (SVR) vs DL models based on long short-term memory (LSTM), considering weekly dengue incidence and climate, in 217 cities in Paraguay. Several city models may present heterogeneous behaviors and poor accuracy. To mitigate this problem, a clustering analysis between time series is performed based on silhouette scores and measuring how well an observation is clustered. Our results indicate the hierarchical clustering combined with Spearman correlation is the most appropriate approach. Then several LSTM models are compared on subgroups of similar time series. The root mean squared error (RMSE) confirms that the LSTM clustered models improve the accuracy by 31.6% approximately. The main contribution of this work is that LSTM clustered models can perform predictions in cities with low incidence by combining information from similar time-series and weather data.