{"title":"从用户生成的Twitter数据中检测药物不良反应:案例研究","authors":"M. Shah, Maitry Patel, Priyank Patel, Xing Tan","doi":"10.1109/WI-IAT55865.2022.00087","DOIUrl":null,"url":null,"abstract":"Adverse Drug Reactions (ADRs) are defined as unwanted drug effects that cause induced mortality and morbidity in health-care. Health-related subjects can be discussed throughout the broad span of social media conversations. Plethora of information available in social media and health-related forums, as well as the rich expression of public opinion, has recently piqued the public health community’s interest in using these sources for pharmacovigilance. We investigate the role of sentiment analysis characteristics in detecting ADR mentions based on user generated dataset obtained from Twitter online streaming API. Our proposed model uses BERT-CNN model with final layer of Support vector machine (SVM) to classify the ADRs mentions. In our study, we extracted tweets from tweeter using Tweepy API and performed data pre-processing, data annotation and data augmentation to create a strong corpus. For data augmentation, we used Marian MT model for to increase the number of tweets with the help of back translation. We passed this corpus to BERT-Base model to get word embeddings and then used CNN model to get important features from data. To get the better efficiency, we used SVM which classifies a tweet. The evaluation study reveals that our proposed model achieved 92% accuracy and 78% F1score. Data augmentation and BERT pre-trained model are the main keys of our proposed model which help us to achieve better result than other machine learning models.","PeriodicalId":345445,"journal":{"name":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detecting Adverse Drug Reactions from User-Generated Twitter Data: A Case Study\",\"authors\":\"M. Shah, Maitry Patel, Priyank Patel, Xing Tan\",\"doi\":\"10.1109/WI-IAT55865.2022.00087\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Adverse Drug Reactions (ADRs) are defined as unwanted drug effects that cause induced mortality and morbidity in health-care. Health-related subjects can be discussed throughout the broad span of social media conversations. Plethora of information available in social media and health-related forums, as well as the rich expression of public opinion, has recently piqued the public health community’s interest in using these sources for pharmacovigilance. We investigate the role of sentiment analysis characteristics in detecting ADR mentions based on user generated dataset obtained from Twitter online streaming API. Our proposed model uses BERT-CNN model with final layer of Support vector machine (SVM) to classify the ADRs mentions. In our study, we extracted tweets from tweeter using Tweepy API and performed data pre-processing, data annotation and data augmentation to create a strong corpus. For data augmentation, we used Marian MT model for to increase the number of tweets with the help of back translation. We passed this corpus to BERT-Base model to get word embeddings and then used CNN model to get important features from data. To get the better efficiency, we used SVM which classifies a tweet. The evaluation study reveals that our proposed model achieved 92% accuracy and 78% F1score. Data augmentation and BERT pre-trained model are the main keys of our proposed model which help us to achieve better result than other machine learning models.\",\"PeriodicalId\":345445,\"journal\":{\"name\":\"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI-IAT55865.2022.00087\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT55865.2022.00087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detecting Adverse Drug Reactions from User-Generated Twitter Data: A Case Study
Adverse Drug Reactions (ADRs) are defined as unwanted drug effects that cause induced mortality and morbidity in health-care. Health-related subjects can be discussed throughout the broad span of social media conversations. Plethora of information available in social media and health-related forums, as well as the rich expression of public opinion, has recently piqued the public health community’s interest in using these sources for pharmacovigilance. We investigate the role of sentiment analysis characteristics in detecting ADR mentions based on user generated dataset obtained from Twitter online streaming API. Our proposed model uses BERT-CNN model with final layer of Support vector machine (SVM) to classify the ADRs mentions. In our study, we extracted tweets from tweeter using Tweepy API and performed data pre-processing, data annotation and data augmentation to create a strong corpus. For data augmentation, we used Marian MT model for to increase the number of tweets with the help of back translation. We passed this corpus to BERT-Base model to get word embeddings and then used CNN model to get important features from data. To get the better efficiency, we used SVM which classifies a tweet. The evaluation study reveals that our proposed model achieved 92% accuracy and 78% F1score. Data augmentation and BERT pre-trained model are the main keys of our proposed model which help us to achieve better result than other machine learning models.