疫苗不良事件报告系统中自报COVID-19疫苗信息的自动识别

IF 1.8 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Jay S Patel, Sonya Zhan, Zasim Siddiqui, Bari Dzomba, Huanmei Wu
{"title":"疫苗不良事件报告系统中自报COVID-19疫苗信息的自动识别","authors":"Jay S Patel,&nbsp;Sonya Zhan,&nbsp;Zasim Siddiqui,&nbsp;Bari Dzomba,&nbsp;Huanmei Wu","doi":"10.1055/s-0042-1760248","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The short time frame between the coronavirus disease 2019 (COVID-19) pandemic declaration and the vaccines authorization led to concerns among public regarding the safety and efficacy of the vaccines. The Food and Drug Administration uses the Vaccine Adverse Events Reporting System (VAERS) where general population can report their vaccine side effects in the text box. This information could be utilized to determine self-reported vaccine side effects.</p><p><strong>Objective: </strong>To develop a supervised and unsupervised natural language processing (NLP) pipeline to extract self-reported COVID-19 vaccination side effects, location of the side effects, medications, and possibly false/misinformation seeking further investigation in a structured format for analysis and reporting.</p><p><strong>Methods: </strong>We utilized the VAERS dataset of COVID-19 vaccine reports from November 2020 to August 2022 of 725,246 individuals. We first developed a gold-standard (GS) dataset of randomly selected 1,500 records. Second, the GS was split into training, testing, and validation sets. The training dataset was used to develop the NLP applications (supervised and unsupervised) and testing and validation datasets were used to test the performances of the NLP application.</p><p><strong>Results: </strong>The NLP application automatically extracted vaccine side effects, body locations of the side effects, medication, and possibly misinformation with moderate to high accuracy (84% sensitivity, 82% specificity, and 83% F-1 measure). We found that 23% people (386,270) faced arm soreness, 31% body swelling (226,208), 23% fatigue/body weakness (168,160), and 22% (159,873) cold/flue-like symptoms. Most of the complications occurred in the body locations such as the arm, back, chest, neck, face, and head. Over-the-counter pain medications such as Tylenol and Ibuprofen and allergy medication like Benadryl were most reported self-reported medications. Death due to COVID-19, changes in the DNA, and infertility were possible false/misinformation reported by people.</p><p><strong>Conclusion: </strong>Some self-reported side effects such as syncope, arthralgia, and blood clotting need further clinical investigations. Our NLP application may help in extracting information from big free-text electronic datasets to help policy makers and other researchers with decision making.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 1-02","pages":"49-59"},"PeriodicalIF":1.8000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Automatic Identification of Self-Reported COVID-19 Vaccine Information from Vaccine Adverse Events Reporting System.\",\"authors\":\"Jay S Patel,&nbsp;Sonya Zhan,&nbsp;Zasim Siddiqui,&nbsp;Bari Dzomba,&nbsp;Huanmei Wu\",\"doi\":\"10.1055/s-0042-1760248\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The short time frame between the coronavirus disease 2019 (COVID-19) pandemic declaration and the vaccines authorization led to concerns among public regarding the safety and efficacy of the vaccines. The Food and Drug Administration uses the Vaccine Adverse Events Reporting System (VAERS) where general population can report their vaccine side effects in the text box. This information could be utilized to determine self-reported vaccine side effects.</p><p><strong>Objective: </strong>To develop a supervised and unsupervised natural language processing (NLP) pipeline to extract self-reported COVID-19 vaccination side effects, location of the side effects, medications, and possibly false/misinformation seeking further investigation in a structured format for analysis and reporting.</p><p><strong>Methods: </strong>We utilized the VAERS dataset of COVID-19 vaccine reports from November 2020 to August 2022 of 725,246 individuals. We first developed a gold-standard (GS) dataset of randomly selected 1,500 records. Second, the GS was split into training, testing, and validation sets. The training dataset was used to develop the NLP applications (supervised and unsupervised) and testing and validation datasets were used to test the performances of the NLP application.</p><p><strong>Results: </strong>The NLP application automatically extracted vaccine side effects, body locations of the side effects, medication, and possibly misinformation with moderate to high accuracy (84% sensitivity, 82% specificity, and 83% F-1 measure). We found that 23% people (386,270) faced arm soreness, 31% body swelling (226,208), 23% fatigue/body weakness (168,160), and 22% (159,873) cold/flue-like symptoms. Most of the complications occurred in the body locations such as the arm, back, chest, neck, face, and head. Over-the-counter pain medications such as Tylenol and Ibuprofen and allergy medication like Benadryl were most reported self-reported medications. Death due to COVID-19, changes in the DNA, and infertility were possible false/misinformation reported by people.</p><p><strong>Conclusion: </strong>Some self-reported side effects such as syncope, arthralgia, and blood clotting need further clinical investigations. Our NLP application may help in extracting information from big free-text electronic datasets to help policy makers and other researchers with decision making.</p>\",\"PeriodicalId\":49822,\"journal\":{\"name\":\"Methods of Information in Medicine\",\"volume\":\"62 1-02\",\"pages\":\"49-59\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods of Information in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1055/s-0042-1760248\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/s-0042-1760248","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2

摘要

背景:2019冠状病毒病(COVID-19)大流行宣布到疫苗批准的时间较短,导致公众对疫苗的安全性和有效性感到担忧。食品和药物管理局使用疫苗不良事件报告系统(VAERS),普通人群可以在文本框中报告他们的疫苗副作用。这些信息可用于确定自我报告的疫苗副作用。目的:建立有监督和无监督的自然语言处理(NLP)管道,以结构化格式提取自我报告的COVID-19疫苗接种副作用、副作用位置、药物以及可能的虚假/错误信息,以便进行进一步调查分析和报告。方法:利用VAERS数据集收集2020年11月至2022年8月725246人的COVID-19疫苗报告。我们首先开发了一个随机选择的1500条记录的金标准(GS)数据集。其次,将GS划分为训练集、测试集和验证集。训练数据集用于开发NLP应用程序(监督和无监督),测试和验证数据集用于测试NLP应用程序的性能。结果:NLP应用程序自动提取疫苗副作用、副作用的身体部位、药物和可能的错误信息,准确度中等至较高(灵敏度84%,特异性82%,F-1测量83%)。我们发现23%的人(386270)有手臂疼痛,31%的人有身体肿胀(226208),23%的人有疲劳/身体无力(168160),22%的人有感冒/流感样症状(159873)。大多数并发症发生在身体部位,如手臂、背部、胸部、颈部、面部和头部。泰诺和布洛芬等非处方止痛药以及苯海拉明等过敏药物是自我报告最多的药物。人们报告的COVID-19死亡、DNA变化和不孕症可能是错误的/错误的信息。结论:一些自述的副作用,如晕厥、关节痛、凝血等,需要进一步的临床调查。我们的NLP应用程序可以帮助从大型自由文本电子数据集中提取信息,以帮助政策制定者和其他研究人员做出决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automatic Identification of Self-Reported COVID-19 Vaccine Information from Vaccine Adverse Events Reporting System.

Background: The short time frame between the coronavirus disease 2019 (COVID-19) pandemic declaration and the vaccines authorization led to concerns among public regarding the safety and efficacy of the vaccines. The Food and Drug Administration uses the Vaccine Adverse Events Reporting System (VAERS) where general population can report their vaccine side effects in the text box. This information could be utilized to determine self-reported vaccine side effects.

Objective: To develop a supervised and unsupervised natural language processing (NLP) pipeline to extract self-reported COVID-19 vaccination side effects, location of the side effects, medications, and possibly false/misinformation seeking further investigation in a structured format for analysis and reporting.

Methods: We utilized the VAERS dataset of COVID-19 vaccine reports from November 2020 to August 2022 of 725,246 individuals. We first developed a gold-standard (GS) dataset of randomly selected 1,500 records. Second, the GS was split into training, testing, and validation sets. The training dataset was used to develop the NLP applications (supervised and unsupervised) and testing and validation datasets were used to test the performances of the NLP application.

Results: The NLP application automatically extracted vaccine side effects, body locations of the side effects, medication, and possibly misinformation with moderate to high accuracy (84% sensitivity, 82% specificity, and 83% F-1 measure). We found that 23% people (386,270) faced arm soreness, 31% body swelling (226,208), 23% fatigue/body weakness (168,160), and 22% (159,873) cold/flue-like symptoms. Most of the complications occurred in the body locations such as the arm, back, chest, neck, face, and head. Over-the-counter pain medications such as Tylenol and Ibuprofen and allergy medication like Benadryl were most reported self-reported medications. Death due to COVID-19, changes in the DNA, and infertility were possible false/misinformation reported by people.

Conclusion: Some self-reported side effects such as syncope, arthralgia, and blood clotting need further clinical investigations. Our NLP application may help in extracting information from big free-text electronic datasets to help policy makers and other researchers with decision making.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Methods of Information in Medicine
Methods of Information in Medicine 医学-计算机:信息系统
CiteScore
3.70
自引率
11.80%
发文量
33
审稿时长
6-12 weeks
期刊介绍: Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信