BioFinBERT: Finetuning Large Language Models (LLMs) to Analyze Sentiment of Press Releases and Financial Text Around Inflection Points of Biotech Stocks

Valentina Aparicio, Daniel Gordon, Sebastian G. Huayamares, Yuhuai Luo
{"title":"BioFinBERT: Finetuning Large Language Models (LLMs) to Analyze Sentiment of Press Releases and Financial Text Around Inflection Points of Biotech Stocks","authors":"Valentina Aparicio, Daniel Gordon, Sebastian G. Huayamares, Yuhuai Luo","doi":"arxiv-2401.11011","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) are deep learning algorithms being used to\nperform natural language processing tasks in various fields, from social\nsciences to finance and biomedical sciences. Developing and training a new LLM\ncan be very computationally expensive, so it is becoming a common practice to\ntake existing LLMs and finetune them with carefully curated datasets for\ndesired applications in different fields. Here, we present BioFinBERT, a\nfinetuned LLM to perform financial sentiment analysis of public text associated\nwith stocks of companies in the biotechnology sector. The stocks of biotech\ncompanies developing highly innovative and risky therapeutic drugs tend to\nrespond very positively or negatively upon a successful or failed clinical\nreadout or regulatory approval of their drug, respectively. These clinical or\nregulatory results are disclosed by the biotech companies via press releases,\nwhich are followed by a significant stock response in many cases. In our\nattempt to design a LLM capable of analyzing the sentiment of these press\nreleases,we first finetuned BioBERT, a biomedical language representation model\ndesigned for biomedical text mining, using financial textual databases. Our\nfinetuned model, termed BioFinBERT, was then used to perform financial\nsentiment analysis of various biotech-related press releases and financial text\naround inflection points that significantly affected the price of biotech\nstocks.","PeriodicalId":501478,"journal":{"name":"arXiv - QuantFin - Trading and Market Microstructure","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Trading and Market Microstructure","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2401.11011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) are deep learning algorithms being used to perform natural language processing tasks in various fields, from social sciences to finance and biomedical sciences. Developing and training a new LLM can be very computationally expensive, so it is becoming a common practice to take existing LLMs and finetune them with carefully curated datasets for desired applications in different fields. Here, we present BioFinBERT, a finetuned LLM to perform financial sentiment analysis of public text associated with stocks of companies in the biotechnology sector. The stocks of biotech companies developing highly innovative and risky therapeutic drugs tend to respond very positively or negatively upon a successful or failed clinical readout or regulatory approval of their drug, respectively. These clinical or regulatory results are disclosed by the biotech companies via press releases, which are followed by a significant stock response in many cases. In our attempt to design a LLM capable of analyzing the sentiment of these press releases,we first finetuned BioBERT, a biomedical language representation model designed for biomedical text mining, using financial textual databases. Our finetuned model, termed BioFinBERT, was then used to perform financial sentiment analysis of various biotech-related press releases and financial text around inflection points that significantly affected the price of biotech stocks.
BioFinBERT:微调大型语言模型 (LLM),分析生物技术股拐点附近的新闻稿和金融文本情绪
大型语言模型(LLM)是一种深度学习算法,被用于执行从社会科学到金融和生物医学等各个领域的自然语言处理任务。开发和训练一个新的 LLM 的计算成本非常昂贵,因此,利用现有的 LLM 并通过精心策划的数据集对其进行微调以满足不同领域的应用需求正成为一种常见的做法。在此,我们介绍 BioFinBERT,这是一种经过调整的 LLM,用于对与生物技术领域公司股票相关的公开文本进行金融情感分析。开发高度创新和高风险治疗药物的生物技术公司的股票往往会在其药物临床试验成功或失败或获得监管部门批准后分别做出非常积极或消极的反应。这些临床或监管结果由生物技术公司通过新闻稿披露,在许多情况下,新闻稿发布后,股票会出现大幅反弹。为了设计一种能够分析这些新闻稿情感的 LLM,我们首先使用金融文本数据库对 BioBERT 进行了微调,这是一种专为生物医学文本挖掘设计的生物医学语言表示模型。经过微调的模型被称为 BioFinBERT,随后被用于对各种生物技术相关新闻稿和金融文本进行金融情感分析,这些分析围绕着对生物技术股票价格有重大影响的拐点展开。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信