OCHADAI at SMM4H-2021 Task 5: Classifying self-reporting tweets on potential cases of COVID-19 by ensembling pre-trained language models

Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task Pub Date : 2021-06-01 DOI:10.18653/V1/2021.SMM4H-1.25

Ying Luo, L. Pereira, Kobayashi Ichiro

引用次数: 0

Abstract

Since the outbreak of coronavirus at the end of 2019, there have been numerous studies on coro- navirus in the NLP arena. Meanwhile, Twitter has been a valuable source of news and a pub- lic medium for the conveyance of information and personal expression. This paper describes the system developed by the Ochadai team for the Social Media Mining for Health Appli- cations (SMM4H) 2021 Task 5, which aims to automatically distinguish English tweets that self-report potential cases of COVID-19 from those that do not. We proposed a model ensemble that leverages pre-trained represen- tations from COVID-Twitter-BERT (Müller et al., 2020), RoBERTa (Liu et al., 2019), and Twitter-RoBERTa (Glazkova et al., 2021). Our model obtained F1-scores of 76% on the test set in the evaluation phase, and 77.5% in the post-evaluation phase.

查看原文本刊更多论文

任务5:通过集成预训练的语言模型对COVID-19潜在病例的自我报告推文进行分类

自2019年底冠状病毒爆发以来，NLP领域对冠状病毒进行了大量研究。与此同时，推特一直是一个宝贵的新闻来源，也是一个传递信息和个人表达的公共媒体。本文描述了由Ochadai团队为健康应用社交媒体挖掘(SMM4H) 2021任务5开发的系统，该系统旨在自动区分自我报告潜在COVID-19病例的英语推文。我们提出了一个模型集成，该模型利用了来自COVID-Twitter-BERT (m等人，2020)、RoBERTa (Liu等人，2019)和Twitter-RoBERTa (Glazkova等人，2021)的预训练表示。我们的模型在评价阶段的测试集上获得了76%的f1分数，在评价后阶段获得了77.5%的f1分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

自引率

0.00%

发文量