The performance of the LSTM-based code generated by Large Language Models (LLMs) in forecasting time series data

Natural Language Processing Journal Pub Date : 2024-12-01 DOI:10.1016/j.nlp.2024.100120

Saroj Gopali , Sima Siami-Namini , Faranak Abri , Akbar Siami Namin

{"title":"The performance of the LSTM-based code generated by Large Language Models (LLMs) in forecasting time series data","authors":"Saroj Gopali , Sima Siami-Namini , Faranak Abri , Akbar Siami Namin","doi":"10.1016/j.nlp.2024.100120","DOIUrl":null,"url":null,"abstract":"<div><div>Generative AI, and in particular Large Language Models (LLMs), have gained substantial momentum due to their wide applications in various disciplines. While the use of these game changing technologies in generating textual information has already been demonstrated in several application domains, their abilities in generating complex models and executable codes need to be explored. As an intriguing case is the goodness of the machine and deep learning models generated by these LLMs in conducting automated scientific data analysis, where a data analyst may not have enough expertise in manually coding and optimizing complex deep learning models and codes and thus may opt to leverage LLMs to generate the required models. This paper investigates and compares the performance of the mainstream LLMs, such as ChatGPT, PaLM, LLama, and Falcon, in generating deep learning models for analyzing time series data, an important and popular data type with its prevalent applications in many application domains including financial and stock market. This research conducts a set of controlled experiments where the prompts for generating deep learning-based models are controlled with respect to sensitivity levels of four criteria including (1) Clarify and Specificity, (2) Objective and Intent, (3) Contextual Information, and (4) Format and Style. While the results are relatively mix, we observe some distinct patterns. We notice that using LLMs, we are able to generate deep learning-based models with executable codes for each dataset separately whose performance are comparable with the manually crafted and optimized LSTM models for predicting the whole time series dataset. We also noticed that ChatGPT outperforms the other LLMs in generating more accurate models. Furthermore, we observed that the goodness of the generated models vary with respect to the “temperature” parameter used in configuring LLMS. The results can be beneficial for data analysts and practitioners who would like to leverage generative AIs to produce good prediction models with acceptable goodness.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"9 ","pages":"Article 100120"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000682","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Generative AI, and in particular Large Language Models (LLMs), have gained substantial momentum due to their wide applications in various disciplines. While the use of these game changing technologies in generating textual information has already been demonstrated in several application domains, their abilities in generating complex models and executable codes need to be explored. As an intriguing case is the goodness of the machine and deep learning models generated by these LLMs in conducting automated scientific data analysis, where a data analyst may not have enough expertise in manually coding and optimizing complex deep learning models and codes and thus may opt to leverage LLMs to generate the required models. This paper investigates and compares the performance of the mainstream LLMs, such as ChatGPT, PaLM, LLama, and Falcon, in generating deep learning models for analyzing time series data, an important and popular data type with its prevalent applications in many application domains including financial and stock market. This research conducts a set of controlled experiments where the prompts for generating deep learning-based models are controlled with respect to sensitivity levels of four criteria including (1) Clarify and Specificity, (2) Objective and Intent, (3) Contextual Information, and (4) Format and Style. While the results are relatively mix, we observe some distinct patterns. We notice that using LLMs, we are able to generate deep learning-based models with executable codes for each dataset separately whose performance are comparable with the manually crafted and optimized LSTM models for predicting the whole time series dataset. We also noticed that ChatGPT outperforms the other LLMs in generating more accurate models. Furthermore, we observed that the goodness of the generated models vary with respect to the “temperature” parameter used in configuring LLMS. The results can be beneficial for data analysts and practitioners who would like to leverage generative AIs to produce good prediction models with acceptable goodness.

查看原文本刊更多论文

研究了由大语言模型生成的基于lstm的代码在预测时间序列数据中的性能

生成式人工智能，特别是大型语言模型（llm），由于其在各个学科的广泛应用而获得了巨大的动力。虽然这些改变游戏规则的技术在生成文本信息方面的使用已经在几个应用领域得到了证明，但是它们在生成复杂模型和可执行代码方面的能力还需要进一步探索。这些法学硕士生成的机器和深度学习模型在进行自动化科学数据分析方面的优点是一个有趣的案例，其中数据分析师可能没有足够的专业知识来手动编码和优化复杂的深度学习模型和代码，因此可能会选择利用法学硕士来生成所需的模型。本文调查并比较了主流llm（如ChatGPT、PaLM、LLama和Falcon）在生成深度学习模型以分析时间序列数据方面的性能，时间序列数据是一种重要且流行的数据类型，在包括金融和股票市场在内的许多应用领域都有广泛的应用。本研究进行了一组对照实验，其中根据四个标准的敏感性水平控制生成基于深度学习的模型的提示，包括(1)澄清和特异性，(2)目标和意图，(3)上下文信息和(4)格式和风格。虽然结果相对复杂，但我们观察到一些明显的模式。我们注意到，使用llm，我们能够为每个数据集单独生成具有可执行代码的基于深度学习的模型，其性能可与用于预测整个时间序列数据集的手动制作和优化的LSTM模型相媲美。我们还注意到，ChatGPT在生成更准确的模型方面优于其他llm。此外，我们观察到生成的模型的优度随配置LLMS时使用的“温度”参数而变化。这些结果对于数据分析师和从业者来说是有益的，他们希望利用生成式人工智能来产生具有可接受的良度的良好预测模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Natural Language Processing Journal

自引率

0.00%

发文量