从用户生成的Twitter数据中检测药物不良反应:案例研究

M. Shah, Maitry Patel, Priyank Patel, Xing Tan
{"title":"从用户生成的Twitter数据中检测药物不良反应:案例研究","authors":"M. Shah, Maitry Patel, Priyank Patel, Xing Tan","doi":"10.1109/WI-IAT55865.2022.00087","DOIUrl":null,"url":null,"abstract":"Adverse Drug Reactions (ADRs) are defined as unwanted drug effects that cause induced mortality and morbidity in health-care. Health-related subjects can be discussed throughout the broad span of social media conversations. Plethora of information available in social media and health-related forums, as well as the rich expression of public opinion, has recently piqued the public health community’s interest in using these sources for pharmacovigilance. We investigate the role of sentiment analysis characteristics in detecting ADR mentions based on user generated dataset obtained from Twitter online streaming API. Our proposed model uses BERT-CNN model with final layer of Support vector machine (SVM) to classify the ADRs mentions. In our study, we extracted tweets from tweeter using Tweepy API and performed data pre-processing, data annotation and data augmentation to create a strong corpus. For data augmentation, we used Marian MT model for to increase the number of tweets with the help of back translation. We passed this corpus to BERT-Base model to get word embeddings and then used CNN model to get important features from data. To get the better efficiency, we used SVM which classifies a tweet. The evaluation study reveals that our proposed model achieved 92% accuracy and 78% F1score. Data augmentation and BERT pre-trained model are the main keys of our proposed model which help us to achieve better result than other machine learning models.","PeriodicalId":345445,"journal":{"name":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detecting Adverse Drug Reactions from User-Generated Twitter Data: A Case Study\",\"authors\":\"M. Shah, Maitry Patel, Priyank Patel, Xing Tan\",\"doi\":\"10.1109/WI-IAT55865.2022.00087\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Adverse Drug Reactions (ADRs) are defined as unwanted drug effects that cause induced mortality and morbidity in health-care. Health-related subjects can be discussed throughout the broad span of social media conversations. Plethora of information available in social media and health-related forums, as well as the rich expression of public opinion, has recently piqued the public health community’s interest in using these sources for pharmacovigilance. We investigate the role of sentiment analysis characteristics in detecting ADR mentions based on user generated dataset obtained from Twitter online streaming API. Our proposed model uses BERT-CNN model with final layer of Support vector machine (SVM) to classify the ADRs mentions. In our study, we extracted tweets from tweeter using Tweepy API and performed data pre-processing, data annotation and data augmentation to create a strong corpus. For data augmentation, we used Marian MT model for to increase the number of tweets with the help of back translation. We passed this corpus to BERT-Base model to get word embeddings and then used CNN model to get important features from data. To get the better efficiency, we used SVM which classifies a tweet. The evaluation study reveals that our proposed model achieved 92% accuracy and 78% F1score. Data augmentation and BERT pre-trained model are the main keys of our proposed model which help us to achieve better result than other machine learning models.\",\"PeriodicalId\":345445,\"journal\":{\"name\":\"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI-IAT55865.2022.00087\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT55865.2022.00087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

药物不良反应(adr)被定义为在卫生保健中引起诱发死亡和发病的不希望的药物效应。与健康相关的话题可以在广泛的社交媒体对话中讨论。社交媒体和健康相关论坛上提供的大量信息,以及公众舆论的丰富表达,最近激起了公共卫生界利用这些资源进行药物警戒的兴趣。基于从Twitter在线流媒体API获得的用户生成数据集,我们研究了情感分析特征在检测ADR提及中的作用。我们提出的模型使用BERT-CNN模型和最后一层支持向量机(SVM)对adr的提及进行分类。在我们的研究中,我们使用Tweepy API提取推文,并进行数据预处理、数据注释和数据增强,以创建一个强大的语料库。对于数据增强,我们使用Marian MT模型,在反向翻译的帮助下增加tweet的数量。我们将该语料库传递给BERT-Base模型进行词嵌入,然后使用CNN模型从数据中提取重要特征。为了获得更好的效率,我们使用支持向量机对tweet进行分类。评估研究表明,我们提出的模型达到了92%的准确率和78%的F1score。数据增强和BERT预训练模型是我们提出的模型的主要关键,帮助我们获得比其他机器学习模型更好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detecting Adverse Drug Reactions from User-Generated Twitter Data: A Case Study
Adverse Drug Reactions (ADRs) are defined as unwanted drug effects that cause induced mortality and morbidity in health-care. Health-related subjects can be discussed throughout the broad span of social media conversations. Plethora of information available in social media and health-related forums, as well as the rich expression of public opinion, has recently piqued the public health community’s interest in using these sources for pharmacovigilance. We investigate the role of sentiment analysis characteristics in detecting ADR mentions based on user generated dataset obtained from Twitter online streaming API. Our proposed model uses BERT-CNN model with final layer of Support vector machine (SVM) to classify the ADRs mentions. In our study, we extracted tweets from tweeter using Tweepy API and performed data pre-processing, data annotation and data augmentation to create a strong corpus. For data augmentation, we used Marian MT model for to increase the number of tweets with the help of back translation. We passed this corpus to BERT-Base model to get word embeddings and then used CNN model to get important features from data. To get the better efficiency, we used SVM which classifies a tweet. The evaluation study reveals that our proposed model achieved 92% accuracy and 78% F1score. Data augmentation and BERT pre-trained model are the main keys of our proposed model which help us to achieve better result than other machine learning models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信