Performance Analysis of XGBoost Algorithm to Determine the Most Optimal Parameters and Features in Predicting Stock Price Movement

Affan Ardana
{"title":"Performance Analysis of XGBoost Algorithm to Determine the Most Optimal Parameters and Features in Predicting Stock Price Movement","authors":"Affan Ardana","doi":"10.31315/telematika.v20i1.9329","DOIUrl":null,"url":null,"abstract":"Purpose: The research aims to find the best parameters and features for predicting stock price movement using the XGBoost algorithm. The parameters are searched using the RMSE value, and the features are searched using the importance value.Design/methodology/approach: The research data is the stock data of Amazon.com company (AMZN). The dataset contains the Date, Low, Open, Volume, High, Close, and Adjusted Close features. The dataset is ensured to have no missing data by handling missing values. The input feature is selected using the Pearson Correlation feature selection method. To prevent the difference between the highest and lowest stock price from being too far apart, the data is scaled using the scaling method. To avoid bias that may appear in the prediction result, cross-validation is used with the Min Max Scaling method, which will devide the dataset into training data and testing data within a range of 30 days after the training data. The parameters to be tested include n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, and max_depth (tree depth) = 3, 4, 5.Findings/result: The result of the research that a learning rate of 0.05 and a tree depth of 5 obtained the lowest RMSE result compared to other models, with an RMSE of 0.009437. The Low feature obtained the highest importance value among all the models built.Originality/value/state of the art: This study used testing data within a range of 30 days after the training data and used a combination of parameters, including n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, amd max_depth (tree depth) = 3, 4, 5. ","PeriodicalId":31716,"journal":{"name":"Telematika","volume":"469 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telematika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31315/telematika.v20i1.9329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: The research aims to find the best parameters and features for predicting stock price movement using the XGBoost algorithm. The parameters are searched using the RMSE value, and the features are searched using the importance value.Design/methodology/approach: The research data is the stock data of Amazon.com company (AMZN). The dataset contains the Date, Low, Open, Volume, High, Close, and Adjusted Close features. The dataset is ensured to have no missing data by handling missing values. The input feature is selected using the Pearson Correlation feature selection method. To prevent the difference between the highest and lowest stock price from being too far apart, the data is scaled using the scaling method. To avoid bias that may appear in the prediction result, cross-validation is used with the Min Max Scaling method, which will devide the dataset into training data and testing data within a range of 30 days after the training data. The parameters to be tested include n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, and max_depth (tree depth) = 3, 4, 5.Findings/result: The result of the research that a learning rate of 0.05 and a tree depth of 5 obtained the lowest RMSE result compared to other models, with an RMSE of 0.009437. The Low feature obtained the highest importance value among all the models built.Originality/value/state of the art: This study used testing data within a range of 30 days after the training data and used a combination of parameters, including n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, amd max_depth (tree depth) = 3, 4, 5. 
XGBoost算法在股票价格走势预测中的性能分析
目的:寻找XGBoost算法预测股价走势的最佳参数和特征。使用RMSE值搜索参数,使用重要性值搜索特征。设计/方法/方法:研究数据为亚马逊公司(AMZN)的股票数据。数据集包含日期,低,打开,音量,高,关闭和调整关闭特征。通过处理缺失值,确保数据集没有缺失数据。使用皮尔逊相关特征选择方法选择输入特征。为了防止最高和最低股票价格之间的差异太远,使用缩放方法对数据进行缩放。为了避免预测结果中可能出现的偏差,交叉验证采用了Min Max Scaling方法,该方法将数据集分为训练数据和测试数据,在训练数据后30天的范围内。需要测试的参数包括n_estimator = 500, early stop round = 3,学习率= 0.01,0.05,0.1,max_depth (tree depth) = 3,4,5。发现/结果:研究结果表明,学习率为0.05,树深度为5时,与其他模型相比RMSE结果最低,RMSE为0.009437。Low特征在所有模型中获得了最高的重要值。独创性/价值/技术水平:本研究使用训练数据后30天范围内的测试数据,并使用组合参数,其中n_estimator = 500,早期停止轮= 3,学习率= 0.01,0.05,0.1,max_depth(树深度)= 3,4,5。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
7
审稿时长
24 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信