Using Author Embeddings to Improve Tweet Stance Classification

NUT@EMNLP Pub Date : 2018-11-01 DOI:10.18653/v1/W18-6124

Adrian Benton, Mark Dredze

引用次数: 20

Abstract

Many social media classification tasks analyze the content of a message, but do not consider the context of the message. For example, in tweet stance classification – where a tweet is categorized according to a viewpoint it espouses – the expressed viewpoint depends on latent beliefs held by the user. In this paper we investigate whether incorporating knowledge about the author can improve tweet stance classification. Furthermore, since author information and embeddings are often unavailable for labeled training examples, we propose a semi-supervised pretraining method to predict user embeddings. Although the neural stance classifiers we learn are often outperformed by a baseline SVM, author embedding pre-training yields improvements over a non-pre-trained neural network on four out of five domains in the SemEval 2016 6A tweet stance classification task. In a tweet gun control stance classification dataset, improvements from pre-training are only apparent when training data is limited.

查看原文本刊更多论文

使用作者嵌入改进推文姿态分类

许多社交媒体分类任务分析消息的内容，但不考虑消息的上下文。例如，在推文立场分类中——推文根据它所支持的观点进行分类——所表达的观点取决于用户持有的潜在信念。在本文中，我们研究了结合作者的知识是否可以提高推文的姿态分类。此外，由于作者信息和嵌入通常无法用于标记训练样例，我们提出了一种半监督预训练方法来预测用户嵌入。尽管我们学习的神经姿态分类器通常优于基线支持向量机，但作者嵌入预训练在SemEval 2016 6A推文姿态分类任务的五个领域中有四个领域比非预训练的神经网络得到了改进。在推特枪支管制姿态分类数据集中，预训练的改进仅在训练数据有限时才明显。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

NUT@EMNLP

自引率

0.00%

发文量