检测新闻文章中的事实和非事实内容

Ishan Sahu, Debapriyo Majumdar
{"title":"检测新闻文章中的事实和非事实内容","authors":"Ishan Sahu, Debapriyo Majumdar","doi":"10.1145/3041823.3041837","DOIUrl":null,"url":null,"abstract":"News articles are a major source of facts about the current state and events of our surrounding world. However, not all news articles are equally rich in presenting the facts. In this paper, we consider the problem of detecting factual and non-factual parts in news articles. We present a comprehensive survey on the existing literature on fact classification on news articles as well as a related and more widely studied problem of subjectivity vs objectivity classification of statements. Combining these techniques and some new features we design a framework for classifying facts and non-facts in news articles. We present extensive experiments on this task using several features and combinations of those on two datasets, one of which was used for subjectivity classification in previous works. We show that standard textual dataset dependent features such as n-grams produce good results on both datasets, but more general features such as part of speech tags and entity types produce inconsistent results. We analyze the results based on the nature of the datasets to present insights on the usefulness of the features and their applicability in the classification task we are considering.","PeriodicalId":173593,"journal":{"name":"Proceedings of the 4th ACM IKDD Conferences on Data Sciences","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Detecting Factual and Non-Factual Content in News Articles\",\"authors\":\"Ishan Sahu, Debapriyo Majumdar\",\"doi\":\"10.1145/3041823.3041837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"News articles are a major source of facts about the current state and events of our surrounding world. However, not all news articles are equally rich in presenting the facts. In this paper, we consider the problem of detecting factual and non-factual parts in news articles. We present a comprehensive survey on the existing literature on fact classification on news articles as well as a related and more widely studied problem of subjectivity vs objectivity classification of statements. Combining these techniques and some new features we design a framework for classifying facts and non-facts in news articles. We present extensive experiments on this task using several features and combinations of those on two datasets, one of which was used for subjectivity classification in previous works. We show that standard textual dataset dependent features such as n-grams produce good results on both datasets, but more general features such as part of speech tags and entity types produce inconsistent results. We analyze the results based on the nature of the datasets to present insights on the usefulness of the features and their applicability in the classification task we are considering.\",\"PeriodicalId\":173593,\"journal\":{\"name\":\"Proceedings of the 4th ACM IKDD Conferences on Data Sciences\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th ACM IKDD Conferences on Data Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3041823.3041837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th ACM IKDD Conferences on Data Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3041823.3041837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

新闻文章是我们周围世界的现状和事件的主要事实来源。然而,并不是所有的新闻文章都能同样丰富地呈现事实。在本文中,我们考虑了新闻文章中事实和非事实部分的检测问题。本文对新闻文章事实分类的现有文献进行了全面的综述,并提出了一个相关的、研究更为广泛的问题,即新闻文章的主客观分类问题。结合这些技术和一些新的特征,我们设计了一个新闻文章中事实和非事实分类的框架。我们在这个任务上进行了广泛的实验,使用了两个数据集上的几个特征和这些特征的组合,其中一个在以前的工作中用于主观性分类。我们表明,标准文本数据集相关特征(如n-gram)在两个数据集上都产生了良好的结果,但更一般的特征(如词性标签和实体类型)产生了不一致的结果。我们根据数据集的性质分析结果,以提出对特征的有用性及其在我们正在考虑的分类任务中的适用性的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detecting Factual and Non-Factual Content in News Articles
News articles are a major source of facts about the current state and events of our surrounding world. However, not all news articles are equally rich in presenting the facts. In this paper, we consider the problem of detecting factual and non-factual parts in news articles. We present a comprehensive survey on the existing literature on fact classification on news articles as well as a related and more widely studied problem of subjectivity vs objectivity classification of statements. Combining these techniques and some new features we design a framework for classifying facts and non-facts in news articles. We present extensive experiments on this task using several features and combinations of those on two datasets, one of which was used for subjectivity classification in previous works. We show that standard textual dataset dependent features such as n-grams produce good results on both datasets, but more general features such as part of speech tags and entity types produce inconsistent results. We analyze the results based on the nature of the datasets to present insights on the usefulness of the features and their applicability in the classification task we are considering.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信