Santoshi Kumari, Harshitha K Reddy, Chandan S Kulkarni, Vanukuri Gowthami
{"title":"揭穿健康假新闻与领域特定的预训练模型","authors":"Santoshi Kumari, Harshitha K Reddy, Chandan S Kulkarni, Vanukuri Gowthami","doi":"10.1016/j.gltp.2021.08.038","DOIUrl":null,"url":null,"abstract":"<div><p>During this covid pandemic it is clearer than ever how much health misinformation effects. It is much easier now to publish health related articles online without validation, these articles are shared across social media contributing to the spread of health fake news. This Health fake news are spread with intent to damage image of person or product, to increase sells of a product or to promote a product. In recent research papers, many useful health misinformation detection models use BERT (Bidirectional Encoder Representations from Transformers) which is pretrained on unlabeled data extracted from English Wikipedia and book corpus and are mostly dealt with health misinformation on social media. Therefore, a self - ensemble SCIBERT (Scientific BERT) based model that makes use of domain specific word embeddings is proposed for detection of health misinformation specifically in news which is less explored and a dataset combining existing FakeHealth dataset and custom dataset that contains health articles scraped from news fact checking website Snopes.com. Classification results exhibits that the proposed model provides weighted F1 score of 0.715.</p></div>","PeriodicalId":100588,"journal":{"name":"Global Transitions Proceedings","volume":"2 2","pages":"Pages 267-272"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.gltp.2021.08.038","citationCount":"4","resultStr":"{\"title\":\"Debunking health fake news with domain specific pre-trained model\",\"authors\":\"Santoshi Kumari, Harshitha K Reddy, Chandan S Kulkarni, Vanukuri Gowthami\",\"doi\":\"10.1016/j.gltp.2021.08.038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>During this covid pandemic it is clearer than ever how much health misinformation effects. It is much easier now to publish health related articles online without validation, these articles are shared across social media contributing to the spread of health fake news. This Health fake news are spread with intent to damage image of person or product, to increase sells of a product or to promote a product. In recent research papers, many useful health misinformation detection models use BERT (Bidirectional Encoder Representations from Transformers) which is pretrained on unlabeled data extracted from English Wikipedia and book corpus and are mostly dealt with health misinformation on social media. Therefore, a self - ensemble SCIBERT (Scientific BERT) based model that makes use of domain specific word embeddings is proposed for detection of health misinformation specifically in news which is less explored and a dataset combining existing FakeHealth dataset and custom dataset that contains health articles scraped from news fact checking website Snopes.com. Classification results exhibits that the proposed model provides weighted F1 score of 0.715.</p></div>\",\"PeriodicalId\":100588,\"journal\":{\"name\":\"Global Transitions Proceedings\",\"volume\":\"2 2\",\"pages\":\"Pages 267-272\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.gltp.2021.08.038\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Transitions Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666285X21000662\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Transitions Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666285X21000662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Debunking health fake news with domain specific pre-trained model
During this covid pandemic it is clearer than ever how much health misinformation effects. It is much easier now to publish health related articles online without validation, these articles are shared across social media contributing to the spread of health fake news. This Health fake news are spread with intent to damage image of person or product, to increase sells of a product or to promote a product. In recent research papers, many useful health misinformation detection models use BERT (Bidirectional Encoder Representations from Transformers) which is pretrained on unlabeled data extracted from English Wikipedia and book corpus and are mostly dealt with health misinformation on social media. Therefore, a self - ensemble SCIBERT (Scientific BERT) based model that makes use of domain specific word embeddings is proposed for detection of health misinformation specifically in news which is less explored and a dataset combining existing FakeHealth dataset and custom dataset that contains health articles scraped from news fact checking website Snopes.com. Classification results exhibits that the proposed model provides weighted F1 score of 0.715.