HeadlineStanceChecker: Exploiting summarization to detect headline disinformation

IF 2.1 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics Pub Date : 2021-11-01 DOI:10.1016/j.websem.2021.100660

Robiert Sepúlveda-Torres, Marta Vicente, Estela Saquete, Elena Lloret, Manuel Palomar

{"title":"HeadlineStanceChecker: Exploiting summarization to detect headline disinformation","authors":"Robiert Sepúlveda-Torres, Marta Vicente, Estela Saquete, Elena Lloret, Manuel Palomar","doi":"10.1016/j.websem.2021.100660","DOIUrl":null,"url":null,"abstract":"<div><p>The headline of a news article is designed to succinctly summarize its content, providing the reader with a clear understanding of the news item. Unfortunately, in the post-truth era, headlines are more focused on attracting the reader’s attention for ideological or commercial reasons, thus leading to mis- or disinformation through false or distorted headlines. One way of combating this, although a challenging task, is by determining the relation between the headline and the body text to establish the stance. Hence, to contribute to the detection of mis- and disinformation, this paper proposes an approach (<em>HeadlineStanceChecker</em>) that determines the stance of a headline with respect to the body text to which it is associated. The novelty rests on the use of a two-stage classification architecture that uses summarization techniques to shape the input for both classifiers instead of directly passing the full news body text, thereby reducing the amount of information to be processed while keeping important information. Specifically, summarization is done through Positional Language Models leveraging on semantic resources to identify salient information in the body text that is then compared to its corresponding headline. The results obtained show that our approach achieves 94.31% accuracy for the overall classification and the best FNC-1 relative score compared with the state of the art. It is especially remarkable that the system, which uses only the relevant information provided by the automatic summaries instead of the whole text, is able to classify the different stance categories with very competitive results, especially in the <em>discuss</em> stance between the headline and the news body text. It can be concluded that using automatic extractive summaries as input of our approach together with the two-stage architecture is an appropriate solution to the problem.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1570826821000354/pdfft?md5=2e9f623b6b4a0278d46a5df6af6c5671&pid=1-s2.0-S1570826821000354-main.pdf","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Semantics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1570826821000354","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 9

Abstract

The headline of a news article is designed to succinctly summarize its content, providing the reader with a clear understanding of the news item. Unfortunately, in the post-truth era, headlines are more focused on attracting the reader’s attention for ideological or commercial reasons, thus leading to mis- or disinformation through false or distorted headlines. One way of combating this, although a challenging task, is by determining the relation between the headline and the body text to establish the stance. Hence, to contribute to the detection of mis- and disinformation, this paper proposes an approach (HeadlineStanceChecker) that determines the stance of a headline with respect to the body text to which it is associated. The novelty rests on the use of a two-stage classification architecture that uses summarization techniques to shape the input for both classifiers instead of directly passing the full news body text, thereby reducing the amount of information to be processed while keeping important information. Specifically, summarization is done through Positional Language Models leveraging on semantic resources to identify salient information in the body text that is then compared to its corresponding headline. The results obtained show that our approach achieves 94.31% accuracy for the overall classification and the best FNC-1 relative score compared with the state of the art. It is especially remarkable that the system, which uses only the relevant information provided by the automatic summaries instead of the whole text, is able to classify the different stance categories with very competitive results, especially in the discuss stance between the headline and the news body text. It can be concluded that using automatic extractive summaries as input of our approach together with the two-stage architecture is an appropriate solution to the problem.

查看原文本刊更多论文

HeadlineStanceChecker:利用摘要来检测标题虚假信息

新闻文章的标题旨在简洁地概括其内容，使读者对新闻项目有一个清晰的了解。不幸的是，在后真相时代，新闻标题更多的是出于意识形态或商业原因而吸引读者的注意力，从而通过虚假或扭曲的标题产生错误或虚假的信息。解决这个问题的一种方法是确定标题和正文之间的关系，以确立立场，尽管这是一项具有挑战性的任务。因此，为了有助于检测错误和虚假信息，本文提出了一种方法(HeadlineStanceChecker)，该方法确定标题相对于与其相关的正文文本的立场。其新颖之处在于使用了两阶段分类架构，该架构使用摘要技术为两个分类器塑造输入，而不是直接传递完整的新闻正文，从而减少了需要处理的信息量，同时保留了重要的信息。具体来说，摘要是通过位置语言模型利用语义资源来识别正文中的重要信息，然后将其与相应的标题进行比较。结果表明，我们的方法在整体分类上达到了94.31%的准确率，并且在FNC-1相对得分上达到了最佳水平。特别值得注意的是，该系统仅使用自动摘要提供的相关信息，而不是全文，能够对不同的立场类别进行分类，结果非常有竞争力，特别是在标题和新闻正文之间的讨论立场。可以得出结论，使用自动提取摘要作为我们方法的输入，并结合两阶段体系结构是解决问题的合适方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Web Semantics 工程技术-计算机：人工智能

CiteScore

6.20

自引率

12.00%

发文量

审稿时长

14.6 weeks

期刊介绍： The Journal of Web Semantics is an interdisciplinary journal based on research and applications of various subject areas that contribute to the development of a knowledge-intensive and intelligent service Web. These areas include: knowledge technologies, ontology, agents, databases and the semantic grid, obviously disciplines like information retrieval, language technology, human-computer interaction and knowledge discovery are of major relevance as well. All aspects of the Semantic Web development are covered. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services. The journal emphasizes the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications.