NIFTY 金融新闻标题数据集

Raeid Saqur, Ken Kato, Nicholas Vinden, Frank Rudzicz
{"title":"NIFTY 金融新闻标题数据集","authors":"Raeid Saqur, Ken Kato, Nicholas Vinden, Frank Rudzicz","doi":"arxiv-2405.09747","DOIUrl":null,"url":null,"abstract":"We introduce and make publicly available the NIFTY Financial News Headlines\ndataset, designed to facilitate and advance research in financial market\nforecasting using large language models (LLMs). This dataset comprises two\ndistinct versions tailored for different modeling approaches: (i) NIFTY-LM,\nwhich targets supervised fine-tuning (SFT) of LLMs with an auto-regressive,\ncausal language-modeling objective, and (ii) NIFTY-RL, formatted specifically\nfor alignment methods (like reinforcement learning from human feedback (RLHF))\nto align LLMs via rejection sampling and reward modeling. Each dataset version\nprovides curated, high-quality data incorporating comprehensive metadata,\nmarket indices, and deduplicated financial news headlines systematically\nfiltered and ranked to suit modern LLM frameworks. We also include experiments\ndemonstrating some applications of the dataset in tasks like stock price\nmovement and the role of LLM embeddings in information acquisition/richness.\nThe NIFTY dataset along with utilities (like truncating prompt's context length\nsystematically) are available on Hugging Face at\nhttps://huggingface.co/datasets/raeidsaqur/NIFTY.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NIFTY Financial News Headlines Dataset\",\"authors\":\"Raeid Saqur, Ken Kato, Nicholas Vinden, Frank Rudzicz\",\"doi\":\"arxiv-2405.09747\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce and make publicly available the NIFTY Financial News Headlines\\ndataset, designed to facilitate and advance research in financial market\\nforecasting using large language models (LLMs). This dataset comprises two\\ndistinct versions tailored for different modeling approaches: (i) NIFTY-LM,\\nwhich targets supervised fine-tuning (SFT) of LLMs with an auto-regressive,\\ncausal language-modeling objective, and (ii) NIFTY-RL, formatted specifically\\nfor alignment methods (like reinforcement learning from human feedback (RLHF))\\nto align LLMs via rejection sampling and reward modeling. Each dataset version\\nprovides curated, high-quality data incorporating comprehensive metadata,\\nmarket indices, and deduplicated financial news headlines systematically\\nfiltered and ranked to suit modern LLM frameworks. We also include experiments\\ndemonstrating some applications of the dataset in tasks like stock price\\nmovement and the role of LLM embeddings in information acquisition/richness.\\nThe NIFTY dataset along with utilities (like truncating prompt's context length\\nsystematically) are available on Hugging Face at\\nhttps://huggingface.co/datasets/raeidsaqur/NIFTY.\",\"PeriodicalId\":501294,\"journal\":{\"name\":\"arXiv - QuantFin - Computational Finance\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuantFin - Computational Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.09747\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.09747","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们介绍并公开了 NIFTY 金融新闻标题数据集,该数据集旨在促进和推动使用大型语言模型(LLM)进行金融市场预测的研究。该数据集包括两个为不同建模方法量身定制的不同版本:(i) NIFTY-LM,目标是以自动回归、因果语言建模为目标,对 LLM 进行有监督的微调(SFT);(ii) NIFTY-RL,专门为对齐方法(如来自人类反馈的强化学习(RLHF))设计,通过拒绝采样和奖励建模对 LLM 进行对齐。每个数据集版本都提供了经过整理的高质量数据,其中包含全面的元数据、市场指数和经过系统过滤和排序的重复金融新闻标题,以适应现代 LLM 框架。我们还在实验中展示了该数据集在股票价格变动等任务中的一些应用,以及 LLM 嵌入在信息获取/丰富性中的作用。NIFTY 数据集和实用工具(如系统截断提示上下文长度)可在 Hugging Face 上获取,网址是:https://huggingface.co/datasets/raeidsaqur/NIFTY。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
NIFTY Financial News Headlines Dataset
We introduce and make publicly available the NIFTY Financial News Headlines dataset, designed to facilitate and advance research in financial market forecasting using large language models (LLMs). This dataset comprises two distinct versions tailored for different modeling approaches: (i) NIFTY-LM, which targets supervised fine-tuning (SFT) of LLMs with an auto-regressive, causal language-modeling objective, and (ii) NIFTY-RL, formatted specifically for alignment methods (like reinforcement learning from human feedback (RLHF)) to align LLMs via rejection sampling and reward modeling. Each dataset version provides curated, high-quality data incorporating comprehensive metadata, market indices, and deduplicated financial news headlines systematically filtered and ranked to suit modern LLM frameworks. We also include experiments demonstrating some applications of the dataset in tasks like stock price movement and the role of LLM embeddings in information acquisition/richness. The NIFTY dataset along with utilities (like truncating prompt's context length systematically) are available on Hugging Face at https://huggingface.co/datasets/raeidsaqur/NIFTY.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信