Global Surveillance of COVID-19 by mining news media using a multi-source dynamic embedded topic model

Yue Li, Pratheeksha Nair, Zhi Wen, I. Chafi, A. Okhmatovskaia, G. Powell, Yannan Shen, D. Buckeridge
{"title":"Global Surveillance of COVID-19 by mining news media using a multi-source dynamic embedded topic model","authors":"Yue Li, Pratheeksha Nair, Zhi Wen, I. Chafi, A. Okhmatovskaia, G. Powell, Yannan Shen, D. Buckeridge","doi":"10.1145/3388440.3412418","DOIUrl":null,"url":null,"abstract":"As the COVID-19 pandemic continues to unfold, understanding the global impact of non-pharmacological interventions (NPI) is important for formulating effective intervention strategies, particularly as many countries prepare for future waves. We used a machine learning approach to distill latent topics related to NPI from large-scale international news media. We hypothesize that these topics are informative about the timing and nature of implemented NPI, dependent on the source of the information (e.g., local news versus official government announcements) and the target countries. Given a set of latent topics associated with NPI (e.g., self-quarantine, social distancing, online education, etc), we assume that countries and media sources have different prior distributions over these topics, which are sampled to generate the news articles. To model the source-specific topic priors, we developed a semi-supervised, multi-source, dynamic, embedded topic model. Our model is able to simultaneously infer latent topics and learn a linear classifier to predict NPI labels using the topic mixtures as input for each news article. To learn these models, we developed an efficient end-to-end amortized variational inference algorithm. We applied our models to news data collected and labelled by the World Health Organization (WHO) and the Global Public Health Intelligence Network (GPHIN). Through comprehensive experiments, we observed superior topic quality and intervention prediction accuracy, compared to the baseline embedded topic models, which ignore information on media source and intervention labels. The inferred latent topics reveal distinct policies and media framing in different countries and media sources, and also characterize reaction to COVID-19 and NPI in a semantically meaningful manner. Our PyTorch code is available on Github (htps://github.com/li-lab-mcgill/covid19_media).","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3412418","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

As the COVID-19 pandemic continues to unfold, understanding the global impact of non-pharmacological interventions (NPI) is important for formulating effective intervention strategies, particularly as many countries prepare for future waves. We used a machine learning approach to distill latent topics related to NPI from large-scale international news media. We hypothesize that these topics are informative about the timing and nature of implemented NPI, dependent on the source of the information (e.g., local news versus official government announcements) and the target countries. Given a set of latent topics associated with NPI (e.g., self-quarantine, social distancing, online education, etc), we assume that countries and media sources have different prior distributions over these topics, which are sampled to generate the news articles. To model the source-specific topic priors, we developed a semi-supervised, multi-source, dynamic, embedded topic model. Our model is able to simultaneously infer latent topics and learn a linear classifier to predict NPI labels using the topic mixtures as input for each news article. To learn these models, we developed an efficient end-to-end amortized variational inference algorithm. We applied our models to news data collected and labelled by the World Health Organization (WHO) and the Global Public Health Intelligence Network (GPHIN). Through comprehensive experiments, we observed superior topic quality and intervention prediction accuracy, compared to the baseline embedded topic models, which ignore information on media source and intervention labels. The inferred latent topics reveal distinct policies and media framing in different countries and media sources, and also characterize reaction to COVID-19 and NPI in a semantically meaningful manner. Our PyTorch code is available on Github (htps://github.com/li-lab-mcgill/covid19_media).
基于多源动态嵌入式主题模型挖掘新闻媒体的COVID-19全球监测
随着COVID-19大流行的持续发展,了解非药物干预措施的全球影响对于制定有效的干预战略非常重要,特别是在许多国家为未来的疫情做准备之际。我们使用机器学习方法从大型国际新闻媒体中提取与NPI相关的潜在话题。我们假设这些主题是关于实施新产品导入的时间和性质的信息,这取决于信息来源(例如,当地新闻与官方政府公告)和目标国家。给定一组与NPI相关的潜在主题(例如,自我隔离、社交距离、在线教育等),我们假设国家和媒体来源对这些主题具有不同的先验分布,对这些主题进行抽样以生成新闻文章。为了建模特定于源的主题先验,我们开发了一个半监督的、多源的、动态的嵌入式主题模型。我们的模型能够同时推断潜在主题,并学习线性分类器来预测NPI标签,使用主题混合物作为每篇新闻文章的输入。为了学习这些模型,我们开发了一种高效的端到端平摊变分推理算法。我们将我们的模型应用于世界卫生组织(WHO)和全球公共卫生情报网(GPHIN)收集和标记的新闻数据。通过综合实验,我们观察到,与忽略媒体来源和干预标签信息的基线嵌入主题模型相比,该模型的主题质量和干预预测精度更高。推断出的潜在话题揭示了不同国家和媒体来源的不同政策和媒体框架,并以语义上有意义的方式描述了对COVID-19和NPI的反应。我们的PyTorch代码可以在Github上获得(https:// github.com/li-lab-mcgill/covid19_media)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信