Fine-tuning BERT to classify COVID19 tweets containing symptoms

Rajarshi Roychoudhury, S. Naskar
{"title":"Fine-tuning BERT to classify COVID19 tweets containing symptoms","authors":"Rajarshi Roychoudhury, S. Naskar","doi":"10.18653/V1/2021.SMM4H-1.30","DOIUrl":null,"url":null,"abstract":"Twitter is a valuable source of patient-generated data that has been used in various population health studies. The first step in many of these studies is to identify and capture Twitter messages (tweets) containing medication mentions. Identifying personal mentions of COVID19 symptoms requires distinguishing personal mentions from other mentions such as symptoms reported by others and references to news articles or other sources. In this article, we describe our submission to Task 6 of the Social Media Mining for Health Applications (SMM4H) Shared Task 2021. This task challenged participants to classify tweets where the target classes are:(1) self-reports,(2) non-personal reports, and (3) literature/news mentions. Our system used a handcrafted preprocessing and word embeddings from BERT encoder model. We achieved an F1 score of 93%","PeriodicalId":378985,"journal":{"name":"Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/V1/2021.SMM4H-1.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Twitter is a valuable source of patient-generated data that has been used in various population health studies. The first step in many of these studies is to identify and capture Twitter messages (tweets) containing medication mentions. Identifying personal mentions of COVID19 symptoms requires distinguishing personal mentions from other mentions such as symptoms reported by others and references to news articles or other sources. In this article, we describe our submission to Task 6 of the Social Media Mining for Health Applications (SMM4H) Shared Task 2021. This task challenged participants to classify tweets where the target classes are:(1) self-reports,(2) non-personal reports, and (3) literature/news mentions. Our system used a handcrafted preprocessing and word embeddings from BERT encoder model. We achieved an F1 score of 93%
微调BERT对包含症状的covid - 19推文进行分类
Twitter是一个有价值的患者生成数据来源,已被用于各种人口健康研究。许多此类研究的第一步是识别和捕获包含药物提及的Twitter消息(tweet)。识别个人提及的covid - 19症状需要将个人提及与其他提及区分开来,例如他人报告的症状以及参考新闻文章或其他来源。在本文中,我们描述了我们提交给健康应用社交媒体挖掘(SMM4H)共享任务2021的任务6。这个任务要求参与者对推文进行分类,其中目标类别是:(1)自我报告,(2)非个人报告,(3)文献/新闻提及。我们的系统使用了手工预处理和BERT编码器模型中的词嵌入。我们取得了93%的F1成绩
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信