Topic Enhanced Word Embedding for Toxic Content Detection in Q&A Sites

2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) Pub Date : 2019-08-01 DOI:10.1145/3341161.3345332

Do Yeon Kim, Xiaohan Li, Sheng Wang, Yunying Zhuo, R. Lee

引用次数: 9

Abstract

Increasingly, users are adopting community question-and-answer (Q&A) sites to exchange information. Detecting and eliminating toxic and divisive content in these Q&A sites are paramount tasks to ensure a safe and constructive environment for the users. Insincere question, which is founded upon false premises, is one type of toxic content in Q&A sites. In this paper, we proposed a novel deep learning framework enhanced pre-trained word embeddings with topical information for insincere question classification. We evaluated our proposed framework on a large real-world dataset from Quora Q&A site and showed that the topically enhanced word embedding is able to achieve better results in toxic content classification. An empirical study was also conducted to analyze the topics of the insincere questions on Quora, and we found that topics on “religion”, “gender” and ‘'politics'’ has a higher proportion of insincere questions.

查看原文本刊更多论文

主题增强词嵌入有毒内容检测在问答网站

越来越多的用户采用社区问答(Q&A)站点来交换信息。检测和消除这些问答网站中的有毒和分裂内容是确保用户安全和建设性环境的首要任务。建立在虚假前提上的不真诚的问题是问答网站中的一种有毒内容。在本文中，我们提出了一种新的深度学习框架，增强了带有主题信息的预训练词嵌入，用于非真诚问题分类。我们在Quora问答网站的一个大型真实数据集上评估了我们提出的框架，并表明主题增强的词嵌入能够在有毒内容分类中取得更好的结果。我们还对Quora上的不真诚问题的主题进行了实证研究，我们发现“宗教”、“性别”和“政治”的不真诚问题所占比例更高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

自引率

0.00%

发文量