Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating

Minghao Zhu, Yoonkyoung Song, Ge Jin, Keyuan Jiang
{"title":"Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating","authors":"Minghao Zhu, Yoonkyoung Song, Ge Jin, Keyuan Jiang","doi":"10.18653/v1/2020.louhi-1.14","DOIUrl":null,"url":null,"abstract":"Post-market surveillance, the practice of monitoring the safe use of pharmaceutical drugs is an important part of pharmacovigilance. Being able to collect personal experience related to pharmaceutical product use could help us gain insight into how the human body reacts to different medications. Twitter, a popular social media service, is being considered as an important alternative data source for collecting personal experience information with medications. Identifying personal experience tweets is a challenging classification task in natural language processing. In this study, we utilized three methods based on Facebook’s Robustly Optimized BERT Pretraining Approach (RoBERTa) to predict personal experience tweets related to medication use: the first one combines the pre-trained RoBERTa model with a classifier, the second combines the updated pre-trained RoBERTa model using a corpus of unlabeled tweets with a classifier, and the third combines the RoBERTa model that was trained with our unlabeled tweets from scratch with the classifier too. Our results show that all of these approaches outperform the published methods (Word Embedding + LSTM) in classification performance (p < 0.05), and updating the pre-trained language model with tweets related to medications could even improve the performance further.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Health Text Mining and Information Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.louhi-1.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Post-market surveillance, the practice of monitoring the safe use of pharmaceutical drugs is an important part of pharmacovigilance. Being able to collect personal experience related to pharmaceutical product use could help us gain insight into how the human body reacts to different medications. Twitter, a popular social media service, is being considered as an important alternative data source for collecting personal experience information with medications. Identifying personal experience tweets is a challenging classification task in natural language processing. In this study, we utilized three methods based on Facebook’s Robustly Optimized BERT Pretraining Approach (RoBERTa) to predict personal experience tweets related to medication use: the first one combines the pre-trained RoBERTa model with a classifier, the second combines the updated pre-trained RoBERTa model using a corpus of unlabeled tweets with a classifier, and the third combines the RoBERTa model that was trained with our unlabeled tweets from scratch with the classifier too. Our results show that all of these approaches outperform the published methods (Word Embedding + LSTM) in classification performance (p < 0.05), and updating the pre-trained language model with tweets related to medications could even improve the performance further.
使用预训练RoBERTa语言模型识别药物效果的个人体验推文及其更新
上市后监测,即对药品安全使用的监测,是药物警戒的重要组成部分。能够收集与药品使用相关的个人经验可以帮助我们深入了解人体对不同药物的反应。热门社交媒体服务Twitter被认为是收集个人用药体验信息的重要替代数据源。在自然语言处理中,识别个人体验推文是一项具有挑战性的分类任务。在本研究中,我们利用基于Facebook稳健优化的BERT预训练方法(RoBERTa)的三种方法来预测与药物使用相关的个人体验推文:第一种方法将预训练的RoBERTa模型与分类器结合,第二种方法将更新的预训练RoBERTa模型使用未标记推文语料库与分类器结合,第三种方法将RoBERTa模型与我们的未标记推文重新训练并与分类器结合。我们的研究结果表明,所有这些方法在分类性能上都优于已发表的方法(Word Embedding + LSTM) (p < 0.05),并且使用与药物相关的推文更新预训练的语言模型甚至可以进一步提高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信