Vassilis Plachouras, Jochen L. Leidner, Andrew G. Garrow
{"title":"量化Twitter上自我报告的药物不良事件:信号和话题分析","authors":"Vassilis Plachouras, Jochen L. Leidner, Andrew G. Garrow","doi":"10.1145/2930971.2930977","DOIUrl":null,"url":null,"abstract":"When a drug that is sold exhibits side effects, a well functioning ecosystem of pharmaceutical drug suppliers includes responsive regulators and pharmaceutical companies. Existing systems for monitoring adverse drug events, such as the Federal Adverse Events Reporting System (FAERS) in the US, have shown limited effectiveness due to the lack of incentives for healthcare professionals and patients. While social media present opportunities to mine information about adverse events in near real-time, there are still important questions to be answered in order to understand their impact on pharmacovigilance. First, it is not known how many relevant social media posts occur per day on platforms like Twitter, i.e., whether there is \"enough signal\" for a post-market pharmacovigilance program based on Twitter mining. Second, it is not known what other topics are discussed by users in posts mentioning pharmaceutical drugs. In this paper, we outline how social media can be used as a human sensor for drug use monitoring. We introduce a large-scale, near real-time system for computational pharmacovigilance, and use our system to estimate the order of magnitude of the volume of daily self-reported pharmaceutical drug side effect tweets. The processing pipeline comprises a set of cascaded filters, followed by a supervised machine learning classifier. The cascaded filters quickly reduce the volume to a manageable sub-stream, from which a Support Vector Machine (SVM) based classifier identifies adverse events based on a rich set of features taking into account surface-textual properties, as well as domain knowledge about drugs, side effects and the Twitter medium. Using a dataset of 10,000 manually annotated tweets, a SVM classifier achieves F1=60.4% and AUC=0.894. The yield of the classifier for a drug universe comprising 2,600 keywords is 721 tweets per day. We also investigate what other topics are discussed in the posts mentioning pharmaceutical drugs. We conclude by suggesting an ecosystem where regulators and pharmaceutical companies utilize social media to obtain feedback about consequences of pharmaceutical drug use.","PeriodicalId":227482,"journal":{"name":"Proceedings of the 7th 2016 International Conference on Social Media & Society","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Quantifying Self-Reported Adverse Drug Events on Twitter: Signal and Topic Analysis\",\"authors\":\"Vassilis Plachouras, Jochen L. Leidner, Andrew G. Garrow\",\"doi\":\"10.1145/2930971.2930977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When a drug that is sold exhibits side effects, a well functioning ecosystem of pharmaceutical drug suppliers includes responsive regulators and pharmaceutical companies. Existing systems for monitoring adverse drug events, such as the Federal Adverse Events Reporting System (FAERS) in the US, have shown limited effectiveness due to the lack of incentives for healthcare professionals and patients. While social media present opportunities to mine information about adverse events in near real-time, there are still important questions to be answered in order to understand their impact on pharmacovigilance. First, it is not known how many relevant social media posts occur per day on platforms like Twitter, i.e., whether there is \\\"enough signal\\\" for a post-market pharmacovigilance program based on Twitter mining. Second, it is not known what other topics are discussed by users in posts mentioning pharmaceutical drugs. In this paper, we outline how social media can be used as a human sensor for drug use monitoring. We introduce a large-scale, near real-time system for computational pharmacovigilance, and use our system to estimate the order of magnitude of the volume of daily self-reported pharmaceutical drug side effect tweets. The processing pipeline comprises a set of cascaded filters, followed by a supervised machine learning classifier. The cascaded filters quickly reduce the volume to a manageable sub-stream, from which a Support Vector Machine (SVM) based classifier identifies adverse events based on a rich set of features taking into account surface-textual properties, as well as domain knowledge about drugs, side effects and the Twitter medium. Using a dataset of 10,000 manually annotated tweets, a SVM classifier achieves F1=60.4% and AUC=0.894. The yield of the classifier for a drug universe comprising 2,600 keywords is 721 tweets per day. We also investigate what other topics are discussed in the posts mentioning pharmaceutical drugs. We conclude by suggesting an ecosystem where regulators and pharmaceutical companies utilize social media to obtain feedback about consequences of pharmaceutical drug use.\",\"PeriodicalId\":227482,\"journal\":{\"name\":\"Proceedings of the 7th 2016 International Conference on Social Media & Society\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th 2016 International Conference on Social Media & Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2930971.2930977\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th 2016 International Conference on Social Media & Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2930971.2930977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Quantifying Self-Reported Adverse Drug Events on Twitter: Signal and Topic Analysis
When a drug that is sold exhibits side effects, a well functioning ecosystem of pharmaceutical drug suppliers includes responsive regulators and pharmaceutical companies. Existing systems for monitoring adverse drug events, such as the Federal Adverse Events Reporting System (FAERS) in the US, have shown limited effectiveness due to the lack of incentives for healthcare professionals and patients. While social media present opportunities to mine information about adverse events in near real-time, there are still important questions to be answered in order to understand their impact on pharmacovigilance. First, it is not known how many relevant social media posts occur per day on platforms like Twitter, i.e., whether there is "enough signal" for a post-market pharmacovigilance program based on Twitter mining. Second, it is not known what other topics are discussed by users in posts mentioning pharmaceutical drugs. In this paper, we outline how social media can be used as a human sensor for drug use monitoring. We introduce a large-scale, near real-time system for computational pharmacovigilance, and use our system to estimate the order of magnitude of the volume of daily self-reported pharmaceutical drug side effect tweets. The processing pipeline comprises a set of cascaded filters, followed by a supervised machine learning classifier. The cascaded filters quickly reduce the volume to a manageable sub-stream, from which a Support Vector Machine (SVM) based classifier identifies adverse events based on a rich set of features taking into account surface-textual properties, as well as domain knowledge about drugs, side effects and the Twitter medium. Using a dataset of 10,000 manually annotated tweets, a SVM classifier achieves F1=60.4% and AUC=0.894. The yield of the classifier for a drug universe comprising 2,600 keywords is 721 tweets per day. We also investigate what other topics are discussed in the posts mentioning pharmaceutical drugs. We conclude by suggesting an ecosystem where regulators and pharmaceutical companies utilize social media to obtain feedback about consequences of pharmaceutical drug use.