深度迁移学习用于波斯语中的COVID-19假新闻检测

IF 2.781

The Journal of Physical Chemistry Pub Date : 2022-04-03 DOI:10.1111/exsy.13008

Masood Ghayoomi, Maryam Mousavian

{"title":"深度迁移学习用于波斯语中的COVID-19假新闻检测","authors":"Masood Ghayoomi, Maryam Mousavian","doi":"10.1111/exsy.13008","DOIUrl":null,"url":null,"abstract":"The spread of fake news on social media has increased dramatically in recent years. Hence, fake news detection systems have received researchers' attention globally. During the COVID-19 outbreak in 2019 and the worldwide epidemic, the importance of this issue becomes more apparent. Due to the importance of the issue, a large number of researchers have begun to collect English datasets and to study COVID-19 fake news detection. However, there are a large number of low-resource languages, including Persian, that cannot develop accurate tools for automatic COVID-19 fake news detection due to the lack of annotated data for the task. In this article, we aim to develop a corpus for Persian in the domain of COVID-19 where the fake news is annotated and to provide a model for detecting Persian COVID-19 fake news. With the impressive advancement of multilingual pre-trained language models, the idea of cross-lingual transfer learning can be proposed to improve the generalization of models trained with low-resource language datasets. Accordingly, we use the state-of-the-art deep cross-lingual contextualized language model, XLM-RoBERTa, and the parallel convolutional neural networks to detect Persian COVID-19 fake news. Moreover, we use the idea of knowledge transferring across-domains to improve the results by using both the English COVID-19 dataset and the general domain Persian fake news dataset. The combination of both cross-lingual and cross-domain transfer learning has outperformed the models and it has beaten the baseline by 2.39% significantly.","PeriodicalId":58,"journal":{"name":"The Journal of Physical Chemistry ","volume":"98 28","pages":"e13008"},"PeriodicalIF":2.7810,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9111484/pdf/","citationCount":"0","resultStr":"{\"title\":\"Deep transfer learning for COVID-19 fake news detection in Persian.\",\"authors\":\"Masood Ghayoomi, Maryam Mousavian\",\"doi\":\"10.1111/exsy.13008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The spread of fake news on social media has increased dramatically in recent years. Hence, fake news detection systems have received researchers' attention globally. During the COVID-19 outbreak in 2019 and the worldwide epidemic, the importance of this issue becomes more apparent. Due to the importance of the issue, a large number of researchers have begun to collect English datasets and to study COVID-19 fake news detection. However, there are a large number of low-resource languages, including Persian, that cannot develop accurate tools for automatic COVID-19 fake news detection due to the lack of annotated data for the task. In this article, we aim to develop a corpus for Persian in the domain of COVID-19 where the fake news is annotated and to provide a model for detecting Persian COVID-19 fake news. With the impressive advancement of multilingual pre-trained language models, the idea of cross-lingual transfer learning can be proposed to improve the generalization of models trained with low-resource language datasets. Accordingly, we use the state-of-the-art deep cross-lingual contextualized language model, XLM-RoBERTa, and the parallel convolutional neural networks to detect Persian COVID-19 fake news. Moreover, we use the idea of knowledge transferring across-domains to improve the results by using both the English COVID-19 dataset and the general domain Persian fake news dataset. The combination of both cross-lingual and cross-domain transfer learning has outperformed the models and it has beaten the baseline by 2.39% significantly.\",\"PeriodicalId\":58,\"journal\":{\"name\":\"The Journal of Physical Chemistry \",\"volume\":\"98 28\",\"pages\":\"e13008\"},\"PeriodicalIF\":2.7810,\"publicationDate\":\"2022-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9111484/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Physical Chemistry \",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1111/exsy.13008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Physical Chemistry ","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1111/exsy.13008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，假新闻在社交媒体上的传播急剧增加。因此，假新闻检测系统受到了全球研究人员的关注。在2019年新冠肺炎疫情和全球疫情期间，这一问题的重要性变得更加明显。由于这个问题的重要性，大量研究人员已经开始收集英文数据集，并研究2019冠状病毒病假新闻的检测。然而，由于缺乏用于任务的注释数据，包括波斯语在内的大量低资源语言无法开发用于自动检测新冠肺炎假新闻的准确工具。在本文中，我们旨在开发一个新冠肺炎领域的波斯语语料库，其中对假新闻进行注释，并提供一个检测波斯语新冠肺炎假新闻的模型。随着多语言预训练语言模型的显著进步，可以提出跨语言迁移学习的想法来提高用低资源语言数据集训练的模型的泛化能力。因此，我们使用最先进的深度跨语言上下文语言模型XLM-RoBERTa和并行卷积神经网络来检测波斯新冠肺炎假新闻。此外，我们使用跨领域知识转移的思想，通过使用英语COVID-19数据集和通用领域波斯假新闻数据集来改进结果。跨语言和跨领域迁移学习的组合表现优于模型，显著超过基线2.39%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep transfer learning for COVID-19 fake news detection in Persian.

The spread of fake news on social media has increased dramatically in recent years. Hence, fake news detection systems have received researchers' attention globally. During the COVID-19 outbreak in 2019 and the worldwide epidemic, the importance of this issue becomes more apparent. Due to the importance of the issue, a large number of researchers have begun to collect English datasets and to study COVID-19 fake news detection. However, there are a large number of low-resource languages, including Persian, that cannot develop accurate tools for automatic COVID-19 fake news detection due to the lack of annotated data for the task. In this article, we aim to develop a corpus for Persian in the domain of COVID-19 where the fake news is annotated and to provide a model for detecting Persian COVID-19 fake news. With the impressive advancement of multilingual pre-trained language models, the idea of cross-lingual transfer learning can be proposed to improve the generalization of models trained with low-resource language datasets. Accordingly, we use the state-of-the-art deep cross-lingual contextualized language model, XLM-RoBERTa, and the parallel convolutional neural networks to detect Persian COVID-19 fake news. Moreover, we use the idea of knowledge transferring across-domains to improve the results by using both the English COVID-19 dataset and the general domain Persian fake news dataset. The combination of both cross-lingual and cross-domain transfer learning has outperformed the models and it has beaten the baseline by 2.39% significantly.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Journal of Physical Chemistry

自引率

0.00%

发文量