Dataset Analysis and Augmentation for Emoji-Sensitive Irony Detection

Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019) Pub Date : 2019-11-01 DOI:10.18653/v1/d19-5527

Shirley Anugrah Hayati, Aditi Chaudhary, Naoki Otani, A. Black

引用次数: 3

Abstract

Irony detection is an important task with applications in identification of online abuse and harassment. With the ubiquitous use of non-verbal cues such as emojis in social media, in this work we aim to study the role of these structures in irony detection. Since the existing irony detection datasets have <10% ironic tweets with emoji, classifiers trained on them are insensitive to emojis. We propose an automated pipeline for creating a more balanced dataset.

查看原文本刊更多论文

表情符号敏感反语检测的数据集分析与增强

反语检测是一项重要的任务，在识别网络滥用和骚扰方面有着广泛的应用。随着社交媒体中表情符号等非语言线索的普遍使用，本研究旨在研究这些结构在反语检测中的作用。由于现有的反讽检测数据集含有表情符号的反讽推文不到10%，因此在这些数据集上训练的分类器对表情符号不敏感。我们提出了一个自动化的管道来创建一个更平衡的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

自引率

0.00%

发文量