Reinforcement prompting for financial synthetic data generation

Q1 Mathematics
{"title":"Reinforcement prompting for financial synthetic data generation","authors":"","doi":"10.1016/j.jfds.2024.100137","DOIUrl":null,"url":null,"abstract":"<div><p>The emergence of Large Language Models (LLMs) has unlocked unprecedented potential for comprehending and generating human-like text, fueling advances in the finance domain – a tool that can shape investment strategies and market predictions. Nevertheless, challenges stemming from the necessity for extensive labeled data and the imperative for data privacy remain. The generation of high-quality synthetic data emerges as a promising avenue to circumvent these issues. In this paper, we introduce a novel methodology, named “Reinforcement Prompting”, to address these challenges. Our strategy employs a policy network as a Selector to generate prompts, and an LLM as an Executor to produce financial synthetic data. This synthetic data generation process preserves data privacy and mitigates the dependency on real-world labeled datasets. We validate the effectiveness of our approach through experimental evaluations. Our results indicate that models trained on synthetic data generated via our approach exhibit competitive performance when compared to those trained on actual financial data, thereby bridging the performance gap. This research provides a novel solution to the challenges of data privacy and labeled data scarcity in financial sentiment analysis, offering considerable advancement in the field of financial machine learning.</p></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2405918824000229/pdfft?md5=00bc590d50782ff3979a1146c9c7d2aa&pid=1-s2.0-S2405918824000229-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Finance and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2405918824000229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

The emergence of Large Language Models (LLMs) has unlocked unprecedented potential for comprehending and generating human-like text, fueling advances in the finance domain – a tool that can shape investment strategies and market predictions. Nevertheless, challenges stemming from the necessity for extensive labeled data and the imperative for data privacy remain. The generation of high-quality synthetic data emerges as a promising avenue to circumvent these issues. In this paper, we introduce a novel methodology, named “Reinforcement Prompting”, to address these challenges. Our strategy employs a policy network as a Selector to generate prompts, and an LLM as an Executor to produce financial synthetic data. This synthetic data generation process preserves data privacy and mitigates the dependency on real-world labeled datasets. We validate the effectiveness of our approach through experimental evaluations. Our results indicate that models trained on synthetic data generated via our approach exhibit competitive performance when compared to those trained on actual financial data, thereby bridging the performance gap. This research provides a novel solution to the challenges of data privacy and labeled data scarcity in financial sentiment analysis, offering considerable advancement in the field of financial machine learning.

金融合成数据生成的强化提示
大型语言模型(LLMs)的出现为理解和生成类人文本释放了前所未有的潜力,推动了金融领域的进步--这是一种可以制定投资策略和市场预测的工具。然而,由于需要大量标注数据以及数据隐私的必要性,挑战依然存在。生成高质量的合成数据是规避这些问题的一条大有可为的途径。在本文中,我们介绍了一种名为 "强化提示 "的新方法来应对这些挑战。我们的策略采用策略网络作为选择器来生成提示,并采用 LLM 作为执行器来生成金融合成数据。这种合成数据生成过程保护了数据隐私,并减轻了对真实世界标记数据集的依赖。我们通过实验评估验证了我们方法的有效性。结果表明,通过我们的方法生成的合成数据上训练的模型与实际金融数据上训练的模型相比,表现出极具竞争力的性能,从而缩小了性能差距。这项研究为金融情感分析中的数据隐私和标记数据稀缺难题提供了一种新颖的解决方案,为金融机器学习领域带来了巨大的进步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Finance and Data Science
Journal of Finance and Data Science Mathematics-Statistics and Probability
CiteScore
3.90
自引率
0.00%
发文量
15
审稿时长
30 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信