Developing a Generic Predictive Computational Model using Semantic data Pre-Processing with Machine Learning Techniques and its application for Stock Market Prediction Purposes

2022 IEEE 24th Conference on Business Informatics (CBI) Pub Date : 2022-06-01 DOI:10.1109/CBI54897.2022.00013

Natalia Yerashenia, D. C. Y. Fee, A. Bolotov

{"title":"Developing a Generic Predictive Computational Model using Semantic data Pre-Processing with Machine Learning Techniques and its application for Stock Market Prediction Purposes","authors":"Natalia Yerashenia, D. C. Y. Fee, A. Bolotov","doi":"10.1109/CBI54897.2022.00013","DOIUrl":null,"url":null,"abstract":"In this paper, we present a Generic Predictive Computational Model (GPCM) and apply it by building a Use Case for the FTSE 100 index forecasting. This involves the mining of heterogeneous data based on semantic methods (ontology), graph-based methods (knowledge graphs, graph databases) and advanced Machine Learning methods. The main focus of our research is data pre-processing aimed at a more efficient selection of input features. The GPCM model pipeline's cycles involve the propagation of the (initially raw) data to the Graph Database structured by an ontology and regular updates of the features' weights in the Graph Database by the feedback loop from the Machine Learning Engine. The Graph Database queries output the most valuable features that, in turn, serve as the input for the Machine Learning-based prediction. The end-product of this process is fed back to the Graph Database to update the weights. We report on practical experiments evaluating the effectiveness of the GPCM application in forecasting the FTSE 100 index. The underlying dataset contains multiple parameters related to predicting time-series data, where Long Short-Term Memory (LSTM) is known to be one of the most efficient machine learning methods. The most challenging task here has been to overcome the known restrictions of LSTM, which is capable of analysing one input parameter only. We solved this problem by combining several parallel LSTMs, a Concatenation unit, which merges the LSTMs' outputs (into a time-series matrix), and a Linear Regression Unit. which produces the final result.","PeriodicalId":447040,"journal":{"name":"2022 IEEE 24th Conference on Business Informatics (CBI)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 24th Conference on Business Informatics (CBI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBI54897.2022.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we present a Generic Predictive Computational Model (GPCM) and apply it by building a Use Case for the FTSE 100 index forecasting. This involves the mining of heterogeneous data based on semantic methods (ontology), graph-based methods (knowledge graphs, graph databases) and advanced Machine Learning methods. The main focus of our research is data pre-processing aimed at a more efficient selection of input features. The GPCM model pipeline's cycles involve the propagation of the (initially raw) data to the Graph Database structured by an ontology and regular updates of the features' weights in the Graph Database by the feedback loop from the Machine Learning Engine. The Graph Database queries output the most valuable features that, in turn, serve as the input for the Machine Learning-based prediction. The end-product of this process is fed back to the Graph Database to update the weights. We report on practical experiments evaluating the effectiveness of the GPCM application in forecasting the FTSE 100 index. The underlying dataset contains multiple parameters related to predicting time-series data, where Long Short-Term Memory (LSTM) is known to be one of the most efficient machine learning methods. The most challenging task here has been to overcome the known restrictions of LSTM, which is capable of analysing one input parameter only. We solved this problem by combining several parallel LSTMs, a Concatenation unit, which merges the LSTMs' outputs (into a time-series matrix), and a Linear Regression Unit. which produces the final result.

查看原文本刊更多论文

基于语义数据预处理和机器学习技术的通用预测计算模型及其在股票市场预测中的应用

在本文中，我们提出了一个通用预测计算模型(GPCM)，并通过构建富时100指数预测的用例来应用它。这涉及到基于语义方法(本体)、基于图的方法(知识图、图数据库)和高级机器学习方法的异构数据挖掘。我们研究的主要焦点是数据预处理，旨在更有效地选择输入特征。GPCM模型管道的周期包括将(最初的原始)数据传播到由本体构成的图数据库，并通过来自机器学习引擎的反馈循环定期更新图数据库中的特征权重。图形数据库查询输出最有价值的特征，这些特征反过来作为基于机器学习的预测的输入。该过程的最终产物被反馈到图数据库以更新权重。我们报告了实际实验评估GPCM应用在预测富时100指数的有效性。底层数据集包含与预测时间序列数据相关的多个参数，其中长短期记忆(LSTM)被认为是最有效的机器学习方法之一。这里最具挑战性的任务是克服LSTM的已知限制，LSTM只能分析一个输入参数。我们通过组合几个并行lstm、一个concatation单元(将lstm的输出合并到一个时间序列矩阵中)和一个线性回归单元来解决这个问题。生成最终结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 24th Conference on Business Informatics (CBI)

自引率

0.00%

发文量