LimeSoda:医疗保健领域假新闻检测数据集

Patomporn Payoungkhamdee, Peerachet Porkaew, Atthasith Sinthunyathum, Phattharaphon Songphum, Witsarut Kawidam, Wichayut Loha-Udom, P. Boonkwan, Vipas Sutantayawalee
{"title":"LimeSoda:医疗保健领域假新闻检测数据集","authors":"Patomporn Payoungkhamdee, Peerachet Porkaew, Atthasith Sinthunyathum, Phattharaphon Songphum, Witsarut Kawidam, Wichayut Loha-Udom, P. Boonkwan, Vipas Sutantayawalee","doi":"10.1109/iSAI-NLP54397.2021.9678187","DOIUrl":null,"url":null,"abstract":"In this paper, we present our Thai fake news dataset in the healthcare domain, LIMESODA, with the construction guideline. Each document in the dataset is classified as fact, fake, or undefined. Moreover, we also provide token-level annotations for validating classifier decisions. Five high-level annotation tags1 are 1) misleading headline 2) imposter 3) fabrication 4) false connection and 5) misleading content. We curate and manually annotated 7,191 documents with these tags. We evaluate our dataset with two deep learning approaches; RNN and Transformer baselines and analyzed token-level contributions to understand model behaviors. For the RNN model, we use the attention weights as token-level contributions. For Transformer models, we use the integrated gradient method at the embedding layers. We finally compared these token-level contributions with human annotations. Although our baseline models yield promising performances, we found that tokens that support model decisions are quite different from human annotation.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"LimeSoda: Dataset for Fake News Detection in Healthcare Domain\",\"authors\":\"Patomporn Payoungkhamdee, Peerachet Porkaew, Atthasith Sinthunyathum, Phattharaphon Songphum, Witsarut Kawidam, Wichayut Loha-Udom, P. Boonkwan, Vipas Sutantayawalee\",\"doi\":\"10.1109/iSAI-NLP54397.2021.9678187\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present our Thai fake news dataset in the healthcare domain, LIMESODA, with the construction guideline. Each document in the dataset is classified as fact, fake, or undefined. Moreover, we also provide token-level annotations for validating classifier decisions. Five high-level annotation tags1 are 1) misleading headline 2) imposter 3) fabrication 4) false connection and 5) misleading content. We curate and manually annotated 7,191 documents with these tags. We evaluate our dataset with two deep learning approaches; RNN and Transformer baselines and analyzed token-level contributions to understand model behaviors. For the RNN model, we use the attention weights as token-level contributions. For Transformer models, we use the integrated gradient method at the embedding layers. We finally compared these token-level contributions with human annotations. Although our baseline models yield promising performances, we found that tokens that support model decisions are quite different from human annotation.\",\"PeriodicalId\":339826,\"journal\":{\"name\":\"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iSAI-NLP54397.2021.9678187\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

在本文中,我们提出了我们在医疗保健领域的泰国假新闻数据集,LIMESODA,以及构建指南。数据集中的每个文档被分类为事实、虚假或未定义。此外,我们还提供了用于验证分类器决策的令牌级注释。五个高级注释标签1)误导性标题2)冒名顶替者3)捏造4)虚假联系5)误导性内容。我们用这些标签整理和手动注释了7191个文档。我们用两种深度学习方法来评估我们的数据集;RNN和Transformer基线,并分析了令牌级别的贡献,以理解模型行为。对于RNN模型,我们使用注意力权重作为令牌级别的贡献。对于Transformer模型,我们在嵌入层上使用积分梯度方法。最后,我们将这些标记级贡献与人工注释进行了比较。尽管我们的基线模型产生了很好的性能,但我们发现支持模型决策的标记与人类注释有很大的不同。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
LimeSoda: Dataset for Fake News Detection in Healthcare Domain
In this paper, we present our Thai fake news dataset in the healthcare domain, LIMESODA, with the construction guideline. Each document in the dataset is classified as fact, fake, or undefined. Moreover, we also provide token-level annotations for validating classifier decisions. Five high-level annotation tags1 are 1) misleading headline 2) imposter 3) fabrication 4) false connection and 5) misleading content. We curate and manually annotated 7,191 documents with these tags. We evaluate our dataset with two deep learning approaches; RNN and Transformer baselines and analyzed token-level contributions to understand model behaviors. For the RNN model, we use the attention weights as token-level contributions. For Transformer models, we use the integrated gradient method at the embedding layers. We finally compared these token-level contributions with human annotations. Although our baseline models yield promising performances, we found that tokens that support model decisions are quite different from human annotation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信