Advancing Retail Predictions: Integrating Diverse Machine Learning Models for Accurate Walmart Sales Forecasting

Cyril Neba C., Gerard Shu F., Gillian Nsuh, Philip Amouda A., Adrian Neba F., F. Webnda, Victory Ikpe, Adeyinka Orelaja, Nabintou Anissia Sylla
{"title":"Advancing Retail Predictions: Integrating Diverse Machine Learning Models for Accurate Walmart Sales Forecasting","authors":"Cyril Neba C., Gerard Shu F., Gillian Nsuh, Philip Amouda A., Adrian Neba F., F. Webnda, Victory Ikpe, Adeyinka Orelaja, Nabintou Anissia Sylla","doi":"10.9734/ajpas/2024/v26i7626","DOIUrl":null,"url":null,"abstract":"In the rapidly evolving landscape of retail analytics, the accurate prediction of sales figures holds paramount importance for informed decision-making and operational optimization. Leveraging diverse machine learning methodologies, this study aims to enhance the precision of Walmart sales forecasting, utilizing a comprehensive dataset sourced from Kaggle. Exploratory data analysis reveals intricate patterns and temporal dependencies within the data, prompting the adoption of advanced predictive modeling techniques. Through the implementation of linear regression, ensemble methods such as Random Forest, Gradient Boosting Machines (GBM), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), this research endeavors to identify the most effective approach for predicting Walmart sales. \nComparative analysis of model performance showcases the superiority of advanced machine learning algorithms over traditional linear models. The results indicate that XGBoost emerges as the optimal predictor for sales forecasting, boasting the lowest Mean Absolute Error (MAE) of 1226.471, Root Mean Squared Error (RMSE) of 1700.981, and an exceptionally high R-squared value of 0.9999900, indicating near-perfect predictive accuracy. This model's performance significantly surpasses that of simpler models such as linear regression, which yielded an MAE of 35632.510 and an RMSE of 80153.858. \nInsights from bias and fairness measurements underscore the effectiveness of advanced models in mitigating bias and delivering equitable predictions across temporal segments. Our analysis revealed varying levels of bias across different models. Linear Regression, Multiple Regression, and GLM exhibited moderate bias, suggesting some systematic errors in predictions. Decision Tree showed slightly higher bias, while Random Forest demonstrated a unique scenario of negative bias, implying systematic underestimation of predictions. However, models like GBM, XGBoost, and LGB displayed biases closer to zero, indicating more accurate predictions with minimal systematic errors. Notably, the XGBoost model demonstrated the lowest bias, with an MAE of -7.548432 (Table 4), reflecting its superior ability to minimize prediction errors across different conditions. Additionally, fairness analysis revealed that XGBoost maintained robust performance in both holiday and non-holiday periods, with an MAE of 84273.385 for holidays and 1757.721 for non-holidays. \nInsights from the fairness measurements revealed that Linear Regression, Multiple Regression, and GLM showed consistent predictive performance across both subgroups. Meanwhile, Decision Tree performed similarly for holiday predictions but exhibited better accuracy for non-holiday sales, whereas, Random Forest, XGBoost, GBM, and LGB models displayed lower MAE values for the non-holiday subgroup, indicating potential fairness issues in predicting holiday sales. \nThe study also highlights the importance of model selection and the impact of advanced machine learning techniques on achieving high predictive accuracy and fairness. Ensemble methods like Random Forest and GBM also showed strong performance, with Random Forest achieving an MAE of 12238.782 and an RMSE of 19814.965, and GBM achieving an MAE of 10839.822 and an RMSE of 1700.981. \nThis research emphasizes the significance of leveraging sophisticated analytics tools to navigate the complexities of retail operations and drive strategic decision-making. By utilizing advanced machine learning models, retailers can achieve more accurate sales forecasts, ultimately leading to better inventory management and enhanced operational efficiency. The study reaffirms the transformative potential of data-driven approaches in driving business growth and innovation in the retail sector.","PeriodicalId":8532,"journal":{"name":"Asian Journal of Probability and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian Journal of Probability and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.9734/ajpas/2024/v26i7626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the rapidly evolving landscape of retail analytics, the accurate prediction of sales figures holds paramount importance for informed decision-making and operational optimization. Leveraging diverse machine learning methodologies, this study aims to enhance the precision of Walmart sales forecasting, utilizing a comprehensive dataset sourced from Kaggle. Exploratory data analysis reveals intricate patterns and temporal dependencies within the data, prompting the adoption of advanced predictive modeling techniques. Through the implementation of linear regression, ensemble methods such as Random Forest, Gradient Boosting Machines (GBM), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), this research endeavors to identify the most effective approach for predicting Walmart sales. Comparative analysis of model performance showcases the superiority of advanced machine learning algorithms over traditional linear models. The results indicate that XGBoost emerges as the optimal predictor for sales forecasting, boasting the lowest Mean Absolute Error (MAE) of 1226.471, Root Mean Squared Error (RMSE) of 1700.981, and an exceptionally high R-squared value of 0.9999900, indicating near-perfect predictive accuracy. This model's performance significantly surpasses that of simpler models such as linear regression, which yielded an MAE of 35632.510 and an RMSE of 80153.858. Insights from bias and fairness measurements underscore the effectiveness of advanced models in mitigating bias and delivering equitable predictions across temporal segments. Our analysis revealed varying levels of bias across different models. Linear Regression, Multiple Regression, and GLM exhibited moderate bias, suggesting some systematic errors in predictions. Decision Tree showed slightly higher bias, while Random Forest demonstrated a unique scenario of negative bias, implying systematic underestimation of predictions. However, models like GBM, XGBoost, and LGB displayed biases closer to zero, indicating more accurate predictions with minimal systematic errors. Notably, the XGBoost model demonstrated the lowest bias, with an MAE of -7.548432 (Table 4), reflecting its superior ability to minimize prediction errors across different conditions. Additionally, fairness analysis revealed that XGBoost maintained robust performance in both holiday and non-holiday periods, with an MAE of 84273.385 for holidays and 1757.721 for non-holidays. Insights from the fairness measurements revealed that Linear Regression, Multiple Regression, and GLM showed consistent predictive performance across both subgroups. Meanwhile, Decision Tree performed similarly for holiday predictions but exhibited better accuracy for non-holiday sales, whereas, Random Forest, XGBoost, GBM, and LGB models displayed lower MAE values for the non-holiday subgroup, indicating potential fairness issues in predicting holiday sales. The study also highlights the importance of model selection and the impact of advanced machine learning techniques on achieving high predictive accuracy and fairness. Ensemble methods like Random Forest and GBM also showed strong performance, with Random Forest achieving an MAE of 12238.782 and an RMSE of 19814.965, and GBM achieving an MAE of 10839.822 and an RMSE of 1700.981. This research emphasizes the significance of leveraging sophisticated analytics tools to navigate the complexities of retail operations and drive strategic decision-making. By utilizing advanced machine learning models, retailers can achieve more accurate sales forecasts, ultimately leading to better inventory management and enhanced operational efficiency. The study reaffirms the transformative potential of data-driven approaches in driving business growth and innovation in the retail sector.
推进零售预测:整合多种机器学习模型,实现准确的沃尔玛销售预测
在快速发展的零售分析领域,准确预测销售数字对于明智决策和运营优化至关重要。本研究采用了多种机器学习方法,旨在利用来自 Kaggle 的综合数据集提高沃尔玛销售预测的准确性。探索性数据分析揭示了数据中错综复杂的模式和时间依赖关系,促使我们采用先进的预测建模技术。本研究通过实施线性回归、随机森林、梯度提升机(GBM)、极梯度提升机(XGBoost)和轻梯度提升机(LightGBM)等集合方法,努力找出预测沃尔玛销售额的最有效方法。对模型性能的比较分析表明,先进的机器学习算法优于传统的线性模型。结果表明,XGBoost 是销售预测的最佳预测器,其平均绝对误差 (MAE) 最低,为 1226.471,均方根误差 (RMSE) 最低,为 1700.981,R 平方值特别高,为 0.9999900,表明预测准确性接近完美。该模型的性能大大超过了线性回归等简单模型,后者的 MAE 为 35632.510,RMSE 为 80153.858。从偏差和公平性测量中获得的启示强调了高级模型在减少偏差和提供跨时段公平预测方面的有效性。我们的分析表明,不同模型存在不同程度的偏差。线性回归、多元回归和 GLM 显示出中等偏差,表明预测中存在一些系统误差。决策树 "的偏差略高,而 "随机森林 "则出现了负偏差的独特情况,这意味着系统性地低估了预测结果。不过,GBM、XGBoost 和 LGB 等模型的偏差接近零,表明预测更准确,系统误差最小。值得注意的是,XGBoost 模型的偏差最小,MAE 为-7.548432(表 4),这反映了它在不同条件下最大限度减少预测误差的卓越能力。此外,公平性分析表明,XGBoost 在节假日和非节假日期间都保持了强劲的性能,节假日的 MAE 为 84273.385,非节假日为 1757.721。公平性测量结果表明,线性回归、多元回归和 GLM 在两个子组中均表现出一致的预测性能。同时,决策树在节假日预测中表现类似,但在非节假日销售中表现出更高的准确性,而随机森林、XGBoost、GBM 和 LGB 模型在非节假日分组中显示出较低的 MAE 值,这表明在预测节假日销售时存在潜在的公平性问题。这项研究还强调了模型选择的重要性以及高级机器学习技术对实现高预测准确性和公平性的影响。随机森林和 GBM 等集合方法也表现出强劲的性能,其中随机森林的 MAE 为 12238.782,RMSE 为 19814.965;GBM 的 MAE 为 10839.822,RMSE 为 1700.981。这项研究强调了利用先进的分析工具来驾驭复杂的零售业务和推动战略决策的重要性。通过利用先进的机器学习模型,零售商可以实现更准确的销售预测,最终改善库存管理并提高运营效率。这项研究再次证实了数据驱动方法在推动零售业业务增长和创新方面的变革潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信