Performance Analysis of XGBoost Algorithm to Determine the Most Optimal Parameters and Features in Predicting Stock Price Movement

Telematika Pub Date : 2023-03-01 DOI:10.31315/telematika.v20i1.9329

Affan Ardana

{"title":"Performance Analysis of XGBoost Algorithm to Determine the Most Optimal Parameters and Features in Predicting Stock Price Movement","authors":"Affan Ardana","doi":"10.31315/telematika.v20i1.9329","DOIUrl":null,"url":null,"abstract":"Purpose: The research aims to find the best parameters and features for predicting stock price movement using the XGBoost algorithm. The parameters are searched using the RMSE value, and the features are searched using the importance value.Design/methodology/approach: The research data is the stock data of Amazon.com company (AMZN). The dataset contains the Date, Low, Open, Volume, High, Close, and Adjusted Close features. The dataset is ensured to have no missing data by handling missing values. The input feature is selected using the Pearson Correlation feature selection method. To prevent the difference between the highest and lowest stock price from being too far apart, the data is scaled using the scaling method. To avoid bias that may appear in the prediction result, cross-validation is used with the Min Max Scaling method, which will devide the dataset into training data and testing data within a range of 30 days after the training data. The parameters to be tested include n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, and max_depth (tree depth) = 3, 4, 5.Findings/result: The result of the research that a learning rate of 0.05 and a tree depth of 5 obtained the lowest RMSE result compared to other models, with an RMSE of 0.009437. The Low feature obtained the highest importance value among all the models built.Originality/value/state of the art: This study used testing data within a range of 30 days after the training data and used a combination of parameters, including n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, amd max_depth (tree depth) = 3, 4, 5. ","PeriodicalId":31716,"journal":{"name":"Telematika","volume":"469 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telematika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31315/telematika.v20i1.9329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: The research aims to find the best parameters and features for predicting stock price movement using the XGBoost algorithm. The parameters are searched using the RMSE value, and the features are searched using the importance value.Design/methodology/approach: The research data is the stock data of Amazon.com company (AMZN). The dataset contains the Date, Low, Open, Volume, High, Close, and Adjusted Close features. The dataset is ensured to have no missing data by handling missing values. The input feature is selected using the Pearson Correlation feature selection method. To prevent the difference between the highest and lowest stock price from being too far apart, the data is scaled using the scaling method. To avoid bias that may appear in the prediction result, cross-validation is used with the Min Max Scaling method, which will devide the dataset into training data and testing data within a range of 30 days after the training data. The parameters to be tested include n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, and max_depth (tree depth) = 3, 4, 5.Findings/result: The result of the research that a learning rate of 0.05 and a tree depth of 5 obtained the lowest RMSE result compared to other models, with an RMSE of 0.009437. The Low feature obtained the highest importance value among all the models built.Originality/value/state of the art: This study used testing data within a range of 30 days after the training data and used a combination of parameters, including n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, amd max_depth (tree depth) = 3, 4, 5.

查看原文本刊更多论文

XGBoost算法在股票价格走势预测中的性能分析

目的:寻找XGBoost算法预测股价走势的最佳参数和特征。使用RMSE值搜索参数，使用重要性值搜索特征。设计/方法/方法:研究数据为亚马逊公司(AMZN)的股票数据。数据集包含日期，低，打开，音量，高，关闭和调整关闭特征。通过处理缺失值，确保数据集没有缺失数据。使用皮尔逊相关特征选择方法选择输入特征。为了防止最高和最低股票价格之间的差异太远，使用缩放方法对数据进行缩放。为了避免预测结果中可能出现的偏差，交叉验证采用了Min Max Scaling方法，该方法将数据集分为训练数据和测试数据，在训练数据后30天的范围内。需要测试的参数包括n_estimator = 500, early stop round = 3，学习率= 0.01,0.05,0.1,max_depth (tree depth) = 3,4,5。发现/结果:研究结果表明，学习率为0.05，树深度为5时，与其他模型相比RMSE结果最低，RMSE为0.009437。Low特征在所有模型中获得了最高的重要值。独创性/价值/技术水平:本研究使用训练数据后30天范围内的测试数据，并使用组合参数，其中n_estimator = 500，早期停止轮= 3，学习率= 0.01,0.05,0.1,max_depth(树深度)= 3,4,5。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Telematika

自引率

0.00%

发文量

审稿时长

24 weeks