揭穿健康假新闻与领域特定的预训练模型

Santoshi Kumari, Harshitha K Reddy, Chandan S Kulkarni, Vanukuri Gowthami
{"title":"揭穿健康假新闻与领域特定的预训练模型","authors":"Santoshi Kumari,&nbsp;Harshitha K Reddy,&nbsp;Chandan S Kulkarni,&nbsp;Vanukuri Gowthami","doi":"10.1016/j.gltp.2021.08.038","DOIUrl":null,"url":null,"abstract":"<div><p>During this covid pandemic it is clearer than ever how much health misinformation effects. It is much easier now to publish health related articles online without validation, these articles are shared across social media contributing to the spread of health fake news. This Health fake news are spread with intent to damage image of person or product, to increase sells of a product or to promote a product. In recent research papers, many useful health misinformation detection models use BERT (Bidirectional Encoder Representations from Transformers) which is pretrained on unlabeled data extracted from English Wikipedia and book corpus and are mostly dealt with health misinformation on social media. Therefore, a self - ensemble SCIBERT (Scientific BERT) based model that makes use of domain specific word embeddings is proposed for detection of health misinformation specifically in news which is less explored and a dataset combining existing FakeHealth dataset and custom dataset that contains health articles scraped from news fact checking website Snopes.com. Classification results exhibits that the proposed model provides weighted F1 score of 0.715.</p></div>","PeriodicalId":100588,"journal":{"name":"Global Transitions Proceedings","volume":"2 2","pages":"Pages 267-272"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.gltp.2021.08.038","citationCount":"4","resultStr":"{\"title\":\"Debunking health fake news with domain specific pre-trained model\",\"authors\":\"Santoshi Kumari,&nbsp;Harshitha K Reddy,&nbsp;Chandan S Kulkarni,&nbsp;Vanukuri Gowthami\",\"doi\":\"10.1016/j.gltp.2021.08.038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>During this covid pandemic it is clearer than ever how much health misinformation effects. It is much easier now to publish health related articles online without validation, these articles are shared across social media contributing to the spread of health fake news. This Health fake news are spread with intent to damage image of person or product, to increase sells of a product or to promote a product. In recent research papers, many useful health misinformation detection models use BERT (Bidirectional Encoder Representations from Transformers) which is pretrained on unlabeled data extracted from English Wikipedia and book corpus and are mostly dealt with health misinformation on social media. Therefore, a self - ensemble SCIBERT (Scientific BERT) based model that makes use of domain specific word embeddings is proposed for detection of health misinformation specifically in news which is less explored and a dataset combining existing FakeHealth dataset and custom dataset that contains health articles scraped from news fact checking website Snopes.com. Classification results exhibits that the proposed model provides weighted F1 score of 0.715.</p></div>\",\"PeriodicalId\":100588,\"journal\":{\"name\":\"Global Transitions Proceedings\",\"volume\":\"2 2\",\"pages\":\"Pages 267-272\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.gltp.2021.08.038\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Transitions Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666285X21000662\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Transitions Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666285X21000662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

在这次covid大流行期间,健康错误信息的影响比以往任何时候都更加清楚。现在在网上发布未经验证的健康相关文章要容易得多,这些文章在社交媒体上分享,助长了健康假新闻的传播。这种健康假新闻的传播目的是为了损害个人或产品的形象,增加产品的销售或推广产品。在最近的研究论文中,许多有用的健康错误信息检测模型使用BERT(来自变形金刚的双向编码器表示),BERT是对从英文维基百科和图书语料库中提取的未标记数据进行预训练的,主要用于处理社交媒体上的健康错误信息。因此,提出了一个基于自集成SCIBERT (Scientific BERT)的模型,该模型利用特定领域的词嵌入来检测新闻中的健康错误信息,特别是在新闻中,这是一个很少被探索的数据集,并结合了现有的FakeHealth数据集和包含从新闻事实检查网站Snopes.com抓取的健康文章的自定义数据集。分类结果表明,该模型的F1加权得分为0.715。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Debunking health fake news with domain specific pre-trained model

During this covid pandemic it is clearer than ever how much health misinformation effects. It is much easier now to publish health related articles online without validation, these articles are shared across social media contributing to the spread of health fake news. This Health fake news are spread with intent to damage image of person or product, to increase sells of a product or to promote a product. In recent research papers, many useful health misinformation detection models use BERT (Bidirectional Encoder Representations from Transformers) which is pretrained on unlabeled data extracted from English Wikipedia and book corpus and are mostly dealt with health misinformation on social media. Therefore, a self - ensemble SCIBERT (Scientific BERT) based model that makes use of domain specific word embeddings is proposed for detection of health misinformation specifically in news which is less explored and a dataset combining existing FakeHealth dataset and custom dataset that contains health articles scraped from news fact checking website Snopes.com. Classification results exhibits that the proposed model provides weighted F1 score of 0.715.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信