{"title":"Crude oil price forecasting using K-means clustering and LSTM model enhanced by dense-sparse-dense strategy","authors":"Alireza Jahandoost, Farhad Abedinzadeh Torghabeh, Seyyed Abed Hosseini, Mahboobeh Houshmand","doi":"10.1186/s40537-024-00977-8","DOIUrl":null,"url":null,"abstract":"<p>Crude oil is an essential energy source that affects international trade, transportation, and manufacturing, highlighting its importance to the economy. Its future price prediction affects consumer prices and the energy markets, and it shapes the development of sustainable energy. It is essential for financial planning, economic stability, and investment decisions. However, reaching a reliable future prediction is an open issue because of its high volatility. Furthermore, many state-of-the-art methods utilize signal decomposition techniques, which can lead to increased prediction time. In this paper, a model called K-means-dense-sparse-dense long short-term memory (K-means-DSD-LSTM) is proposed, which has three main training phrases for crude oil price forecasting. In the first phase, the DSD-LSTM model is trained. Afterwards, the training part of the data is clustered using the K-means algorithm. Finally, a copy of the trained DSD-LSTM model is fine-tuned for each obtained cluster. It helps the models predict that cluster better while they are generalizing the whole dataset quite well, which diminishes overfitting. The proposed model is evaluated on two famous crude oil benchmarks: West Texas Intermediate (WTI) and Brent. Empirical evaluations demonstrated the superiority of the DSD-LSTM model over the K-means-LSTM model. Furthermore, the K-means-DSD-LSTM model exhibited even stronger performance. Notably, the proposed method yielded promising results across diverse datasets, achieving competitive performance in comparison to existing methods, even without employing signal decomposition techniques.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"5 1","pages":""},"PeriodicalIF":8.6000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00977-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Crude oil is an essential energy source that affects international trade, transportation, and manufacturing, highlighting its importance to the economy. Its future price prediction affects consumer prices and the energy markets, and it shapes the development of sustainable energy. It is essential for financial planning, economic stability, and investment decisions. However, reaching a reliable future prediction is an open issue because of its high volatility. Furthermore, many state-of-the-art methods utilize signal decomposition techniques, which can lead to increased prediction time. In this paper, a model called K-means-dense-sparse-dense long short-term memory (K-means-DSD-LSTM) is proposed, which has three main training phrases for crude oil price forecasting. In the first phase, the DSD-LSTM model is trained. Afterwards, the training part of the data is clustered using the K-means algorithm. Finally, a copy of the trained DSD-LSTM model is fine-tuned for each obtained cluster. It helps the models predict that cluster better while they are generalizing the whole dataset quite well, which diminishes overfitting. The proposed model is evaluated on two famous crude oil benchmarks: West Texas Intermediate (WTI) and Brent. Empirical evaluations demonstrated the superiority of the DSD-LSTM model over the K-means-LSTM model. Furthermore, the K-means-DSD-LSTM model exhibited even stronger performance. Notably, the proposed method yielded promising results across diverse datasets, achieving competitive performance in comparison to existing methods, even without employing signal decomposition techniques.
期刊介绍:
The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.