Mahdi Abolghasemi , Odkhishig Ganbold , Kristian Rotaru
{"title":"Humans vs. large language models: Judgmental forecasting in an era of advanced AI","authors":"Mahdi Abolghasemi , Odkhishig Ganbold , Kristian Rotaru","doi":"10.1016/j.ijforecast.2024.07.003","DOIUrl":null,"url":null,"abstract":"<div><div>This study investigates the forecasting accuracy of human experts versus large language models (LLMs) in the retail sector, particularly during standard and promotional sales periods. Utilizing a controlled experimental setup with 123 human forecasters and five LLMs—namely, ChatGPT-4, ChatGPT3.5, Bard, Bing, and Llama2—we evaluated forecasting precision through the absolute percentage error. Our analysis centered on the effect of the following factors on forecasters’ performance: the supporting statistical model (baseline and advanced), whether the product was on promotion, and the nature of external impact. The findings indicate that LLMs do not consistently outperform humans in forecasting accuracy and that advanced statistical forecasting models do not uniformly enhance the performance of either human forecasters or LLMs. Both human and LLM forecasters exhibited increased forecasting errors, particularly during promotional periods. Our findings call for careful consideration when integrating LLMs into practical forecasting processes.</div></div>","PeriodicalId":14061,"journal":{"name":"International Journal of Forecasting","volume":"41 2","pages":"Pages 631-648"},"PeriodicalIF":6.9000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Forecasting","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169207024000700","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
This study investigates the forecasting accuracy of human experts versus large language models (LLMs) in the retail sector, particularly during standard and promotional sales periods. Utilizing a controlled experimental setup with 123 human forecasters and five LLMs—namely, ChatGPT-4, ChatGPT3.5, Bard, Bing, and Llama2—we evaluated forecasting precision through the absolute percentage error. Our analysis centered on the effect of the following factors on forecasters’ performance: the supporting statistical model (baseline and advanced), whether the product was on promotion, and the nature of external impact. The findings indicate that LLMs do not consistently outperform humans in forecasting accuracy and that advanced statistical forecasting models do not uniformly enhance the performance of either human forecasters or LLMs. Both human and LLM forecasters exhibited increased forecasting errors, particularly during promotional periods. Our findings call for careful consideration when integrating LLMs into practical forecasting processes.
期刊介绍:
The International Journal of Forecasting is a leading journal in its field that publishes high quality refereed papers. It aims to bridge the gap between theory and practice, making forecasting useful and relevant for decision and policy makers. The journal places strong emphasis on empirical studies, evaluation activities, implementation research, and improving the practice of forecasting. It welcomes various points of view and encourages debate to find solutions to field-related problems. The journal is the official publication of the International Institute of Forecasters (IIF) and is indexed in Sociological Abstracts, Journal of Economic Literature, Statistical Theory and Method Abstracts, INSPEC, Current Contents, UMI Data Courier, RePEc, Academic Journal Guide, CIS, IAOR, and Social Sciences Citation Index.