利用深度学习技术在短文中进行语境化讽刺检测

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Web Engineering Pub Date : 2024-01-01 DOI:10.13052/jwe1540-9589.2312

Ashraf Kamal;Muhammad Abulaish;Jahiruddin

{"title":"利用深度学习技术在短文中进行语境化讽刺检测","authors":"Ashraf Kamal;Muhammad Abulaish;Jahiruddin","doi":"10.13052/jwe1540-9589.2312","DOIUrl":null,"url":null,"abstract":"Satire is prominent in user-generated content on various online platforms in the form of satirical news, customer reviews, blogs, articles, and short messages that are typically of an informal nature. As satire is also used to disseminate false information on the Internet, its computational detection has become a well-known issue. Existing work focuses primarily on formal document- or sentence-level textual data, whereas informal short texts have gotten less attention for satire detection. This paper presents a new model called BiLSTM self-attention (BiSAT) for detecting satire in informal short texts. It consists of various components such as input, embedding, self-attention, and two bi-directional long short-term memory (BiLSTM) layers for learning crucial contextual information pertaining to the satire present in the texts. The input layer uses the text as input to create an input vector, which is then given to the embedding layer to create the appropriate numeric vector. The output of the embedding layer is passed on to the first BiLSTM layer, which extracts contextual information-based sequences in the opposite direction. Between the first and second BiLSTM layers, a self-attention layer is employed to draw attention to the important satirical information that is acquired by the hidden layer of the first BiLSTM. The BiSAT model also takes a classic feature engineering approach, employing a 13-dimensional auxiliary feature vector comprised of features from four separate feature categories: sentiment, punctuation, hyperbole, and affective. The proposed BiSAT model is empirically evaluated on two benchmark datasets and a newly created dataset called Satire-280. It outperforms existing research and baseline methods by a significant margin. The Satire-280 dataset along with code can be downloaded from GitHub repository: https://github.com/Ashraf-Kamal/Satire-Detection.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"23 1","pages":"27-52"},"PeriodicalIF":1.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10488438","citationCount":"0","resultStr":"{\"title\":\"Contextualized Satire Detection in Short Texts Using Deep Learning Techniques\",\"authors\":\"Ashraf Kamal;Muhammad Abulaish;Jahiruddin\",\"doi\":\"10.13052/jwe1540-9589.2312\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Satire is prominent in user-generated content on various online platforms in the form of satirical news, customer reviews, blogs, articles, and short messages that are typically of an informal nature. As satire is also used to disseminate false information on the Internet, its computational detection has become a well-known issue. Existing work focuses primarily on formal document- or sentence-level textual data, whereas informal short texts have gotten less attention for satire detection. This paper presents a new model called BiLSTM self-attention (BiSAT) for detecting satire in informal short texts. It consists of various components such as input, embedding, self-attention, and two bi-directional long short-term memory (BiLSTM) layers for learning crucial contextual information pertaining to the satire present in the texts. The input layer uses the text as input to create an input vector, which is then given to the embedding layer to create the appropriate numeric vector. The output of the embedding layer is passed on to the first BiLSTM layer, which extracts contextual information-based sequences in the opposite direction. Between the first and second BiLSTM layers, a self-attention layer is employed to draw attention to the important satirical information that is acquired by the hidden layer of the first BiLSTM. The BiSAT model also takes a classic feature engineering approach, employing a 13-dimensional auxiliary feature vector comprised of features from four separate feature categories: sentiment, punctuation, hyperbole, and affective. The proposed BiSAT model is empirically evaluated on two benchmark datasets and a newly created dataset called Satire-280. It outperforms existing research and baseline methods by a significant margin. The Satire-280 dataset along with code can be downloaded from GitHub repository: https://github.com/Ashraf-Kamal/Satire-Detection.\",\"PeriodicalId\":49952,\"journal\":{\"name\":\"Journal of Web Engineering\",\"volume\":\"23 1\",\"pages\":\"27-52\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10488438\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Web Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10488438/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10488438/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

讽刺在各种网络平台上的用户生成内容中非常突出，其形式包括讽刺新闻、客户评论、博客、文章和通常具有非正式性质的短消息。由于讽刺也被用于在互联网上传播虚假信息，因此其计算检测已成为一个众所周知的问题。现有的工作主要集中在正式的文档或句子级文本数据上，而非正式的短文在讽刺信息检测方面受到的关注较少。本文提出了一种名为 BiLSTM 自我关注（BiSAT）的新模型，用于检测非正式短文中的讽刺内容。该模型由输入、嵌入、自我注意和两个双向长短期记忆（BiLSTM）层等多个部分组成，用于学习与文本中存在的讽刺内容有关的关键上下文信息。输入层将文本作为输入来创建输入向量，然后将其交给嵌入层来创建相应的数字向量。嵌入层的输出被传递到第一个 BiLSTM 层，该层从相反的方向提取基于上下文信息的序列。在第一和第二 BiLSTM 层之间，采用了一个自我注意层，以吸引人们注意第一 BiLSTM 隐藏层获取的重要讽刺信息。BiSAT 模型还采用了经典的特征工程方法，使用了一个 13 维的辅助特征向量，由情感、标点符号、夸张和情感四个不同特征类别的特征组成。提议的 BiSAT 模型在两个基准数据集和一个新创建的名为 Satire-280 的数据集上进行了实证评估。结果表明，该模型大大优于现有研究和基准方法。Satire-280 数据集和代码可从 GitHub 存储库下载：https://github.com/Ashraf-Kamal/Satire-Detection。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Contextualized Satire Detection in Short Texts Using Deep Learning Techniques

Satire is prominent in user-generated content on various online platforms in the form of satirical news, customer reviews, blogs, articles, and short messages that are typically of an informal nature. As satire is also used to disseminate false information on the Internet, its computational detection has become a well-known issue. Existing work focuses primarily on formal document- or sentence-level textual data, whereas informal short texts have gotten less attention for satire detection. This paper presents a new model called BiLSTM self-attention (BiSAT) for detecting satire in informal short texts. It consists of various components such as input, embedding, self-attention, and two bi-directional long short-term memory (BiLSTM) layers for learning crucial contextual information pertaining to the satire present in the texts. The input layer uses the text as input to create an input vector, which is then given to the embedding layer to create the appropriate numeric vector. The output of the embedding layer is passed on to the first BiLSTM layer, which extracts contextual information-based sequences in the opposite direction. Between the first and second BiLSTM layers, a self-attention layer is employed to draw attention to the important satirical information that is acquired by the hidden layer of the first BiLSTM. The BiSAT model also takes a classic feature engineering approach, employing a 13-dimensional auxiliary feature vector comprised of features from four separate feature categories: sentiment, punctuation, hyperbole, and affective. The proposed BiSAT model is empirically evaluated on two benchmark datasets and a newly created dataset called Satire-280. It outperforms existing research and baseline methods by a significant margin. The Satire-280 dataset along with code can be downloaded from GitHub repository: https://github.com/Ashraf-Kamal/Satire-Detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Web Engineering 工程技术-计算机：理论方法

CiteScore

1.80

自引率

12.50%

发文量

审稿时长

9 months

期刊介绍： The World Wide Web and its associated technologies have become a major implementation and delivery platform for a large variety of applications, ranging from simple institutional information Web sites to sophisticated supply-chain management systems, financial applications, e-government, distance learning, and entertainment, among others. Such applications, in addition to their intrinsic functionality, also exhibit the more complex behavior of distributed applications.