基于命名实体识别的潜在狄利克雷分配股票趋势预测神经网络框架

IF 2.9 4区综合性期刊 Q2 MULTIDISCIPLINARY SCIENCES

Arabian Journal for Science and Engineering Pub Date : 2025-03-17 DOI:10.1007/s13369-025-10090-4

Manas Ranjan Prusty, Apoorv Kumar Sinha, Sanskriti Sanjay Kumar Singh, Shreyas Sai, Vijayakumar Kedalu Poornachary, Subhra Rani Patra

{"title":"基于命名实体识别的潜在狄利克雷分配股票趋势预测神经网络框架","authors":"Manas Ranjan Prusty, Apoorv Kumar Sinha, Sanskriti Sanjay Kumar Singh, Shreyas Sai, Vijayakumar Kedalu Poornachary, Subhra Rani Patra","doi":"10.1007/s13369-025-10090-4","DOIUrl":null,"url":null,"abstract":"<div><p>Stock price prediction is an extensively researched topic as the precise prophecy of stock trends is decisive in the investment marketing sphere. With increasing opinions by many market giants on the internet about given stocks, it surges the necessity to study these sentiments in detail for forthcoming predictions. From these articles on the internet, natural text is generated by examining factors that affect the values of stocks and therefore these texts are reliable features to go ahead with this study. The idea behind tackling such work is that conglomerates and businesses are able to tangibly understand the aftermath of articles that usually mobilize public opinion and gear them in a certain direction. The aim of this study is to utilize named entity recognition (NER) on a neural network framework for stock trend prediction through latent Dirichlet allocation using these natural texts generated from internet articles. This method is used to understand the words that occur at the highest frequency and add the most information to the corpus depending on the topic’s importance. With this, the model adopts K × K words that have the most decisive impact on the target class that has been created with which it alters the sparse density matrix that has been generated. The proposed model of the NER-based neural network was fitted on a real-world dataset, and its performance was good in comparison with state-of-the-art models developed by fellow researchers. However, since the model does not use the BERT tokenizers, it cannot be adjudged on the FinBERT model, and therefore, the preprocessed data is fed to a pruned recurrent neural network which is robustly stopped with a simple callback function. The final result was a strong 0.81 tetrachoric correlation between the testing target class and the predicted target class. With this, the model provides a different approach to natural language processing, especially those with high sparse density for stock prediction.</p></div>","PeriodicalId":54354,"journal":{"name":"Arabian Journal for Science and Engineering","volume":"50 19","pages":"16135 - 16148"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Named Entity Recognition Based Neural Network Framework for Stock Trend Prediction Using Latent Dirichlet Allocation\",\"authors\":\"Manas Ranjan Prusty, Apoorv Kumar Sinha, Sanskriti Sanjay Kumar Singh, Shreyas Sai, Vijayakumar Kedalu Poornachary, Subhra Rani Patra\",\"doi\":\"10.1007/s13369-025-10090-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Stock price prediction is an extensively researched topic as the precise prophecy of stock trends is decisive in the investment marketing sphere. With increasing opinions by many market giants on the internet about given stocks, it surges the necessity to study these sentiments in detail for forthcoming predictions. From these articles on the internet, natural text is generated by examining factors that affect the values of stocks and therefore these texts are reliable features to go ahead with this study. The idea behind tackling such work is that conglomerates and businesses are able to tangibly understand the aftermath of articles that usually mobilize public opinion and gear them in a certain direction. The aim of this study is to utilize named entity recognition (NER) on a neural network framework for stock trend prediction through latent Dirichlet allocation using these natural texts generated from internet articles. This method is used to understand the words that occur at the highest frequency and add the most information to the corpus depending on the topic’s importance. With this, the model adopts K × K words that have the most decisive impact on the target class that has been created with which it alters the sparse density matrix that has been generated. The proposed model of the NER-based neural network was fitted on a real-world dataset, and its performance was good in comparison with state-of-the-art models developed by fellow researchers. However, since the model does not use the BERT tokenizers, it cannot be adjudged on the FinBERT model, and therefore, the preprocessed data is fed to a pruned recurrent neural network which is robustly stopped with a simple callback function. The final result was a strong 0.81 tetrachoric correlation between the testing target class and the predicted target class. With this, the model provides a different approach to natural language processing, especially those with high sparse density for stock prediction.</p></div>\",\"PeriodicalId\":54354,\"journal\":{\"name\":\"Arabian Journal for Science and Engineering\",\"volume\":\"50 19\",\"pages\":\"16135 - 16148\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Arabian Journal for Science and Engineering\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s13369-025-10090-4\",\"RegionNum\":4,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arabian Journal for Science and Engineering","FirstCategoryId":"103","ListUrlMain":"https://link.springer.com/article/10.1007/s13369-025-10090-4","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

股票价格预测是一个被广泛研究的话题，因为对股票走势的准确预测在投资营销领域是决定性的。随着互联网上许多市场巨头对特定股票的看法越来越多，在即将到来的预测中，详细研究这些观点的必要性增加了。从互联网上的这些文章中，自然文本是通过检查影响股票价值的因素而产生的，因此这些文本是进行本研究的可靠特征。这样做的目的是，让大企业和企业能够实实在在地了解通常动员舆论的文章所产生的后果，并使其朝着一定的方向发展。本研究的目的是利用神经网络框架上的命名实体识别（NER），利用从互联网文章中生成的这些自然文本，通过潜在的狄利克雷分配进行股票趋势预测。这种方法用于理解出现频率最高的单词，并根据主题的重要性向语料库中添加最多的信息。因此，模型采用K × K个对已创建的目标类具有最决定性影响的词，以此来改变已生成的稀疏密度矩阵。提出的基于ner的神经网络模型被拟合到一个真实的数据集上，与其他研究人员开发的最先进的模型相比，它的性能很好。然而，由于该模型不使用BERT标记器，因此无法在FinBERT模型上进行判断，因此，预处理数据被馈送到经过修剪的递归神经网络中，该神经网络通过简单的回调函数鲁棒停止。最终的结果是测试目标类别与预测目标类别之间的四分频相关性为0.81。因此，该模型为自然语言处理提供了一种不同的方法，特别是对于那些具有高稀疏密度的库存预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Named Entity Recognition Based Neural Network Framework for Stock Trend Prediction Using Latent Dirichlet Allocation

查看原文本刊更多论文

Named Entity Recognition Based Neural Network Framework for Stock Trend Prediction Using Latent Dirichlet Allocation

Stock price prediction is an extensively researched topic as the precise prophecy of stock trends is decisive in the investment marketing sphere. With increasing opinions by many market giants on the internet about given stocks, it surges the necessity to study these sentiments in detail for forthcoming predictions. From these articles on the internet, natural text is generated by examining factors that affect the values of stocks and therefore these texts are reliable features to go ahead with this study. The idea behind tackling such work is that conglomerates and businesses are able to tangibly understand the aftermath of articles that usually mobilize public opinion and gear them in a certain direction. The aim of this study is to utilize named entity recognition (NER) on a neural network framework for stock trend prediction through latent Dirichlet allocation using these natural texts generated from internet articles. This method is used to understand the words that occur at the highest frequency and add the most information to the corpus depending on the topic’s importance. With this, the model adopts K × K words that have the most decisive impact on the target class that has been created with which it alters the sparse density matrix that has been generated. The proposed model of the NER-based neural network was fitted on a real-world dataset, and its performance was good in comparison with state-of-the-art models developed by fellow researchers. However, since the model does not use the BERT tokenizers, it cannot be adjudged on the FinBERT model, and therefore, the preprocessed data is fed to a pruned recurrent neural network which is robustly stopped with a simple callback function. The final result was a strong 0.81 tetrachoric correlation between the testing target class and the predicted target class. With this, the model provides a different approach to natural language processing, especially those with high sparse density for stock prediction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Arabian Journal for Science and Engineering MULTIDISCIPLINARY SCIENCES-

CiteScore

5.70

自引率

3.40%

发文量

993

期刊介绍： King Fahd University of Petroleum & Minerals (KFUPM) partnered with Springer to publish the Arabian Journal for Science and Engineering (AJSE). AJSE, which has been published by KFUPM since 1975, is a recognized national, regional and international journal that provides a great opportunity for the dissemination of research advances from the Kingdom of Saudi Arabia, MENA and the world.