Do Yeon Kim, Xiaohan Li, Sheng Wang, Yunying Zhuo, R. Lee
{"title":"主题增强词嵌入有毒内容检测在问答网站","authors":"Do Yeon Kim, Xiaohan Li, Sheng Wang, Yunying Zhuo, R. Lee","doi":"10.1145/3341161.3345332","DOIUrl":null,"url":null,"abstract":"Increasingly, users are adopting community question-and-answer (Q&A) sites to exchange information. Detecting and eliminating toxic and divisive content in these Q&A sites are paramount tasks to ensure a safe and constructive environment for the users. Insincere question, which is founded upon false premises, is one type of toxic content in Q&A sites. In this paper, we proposed a novel deep learning framework enhanced pre-trained word embeddings with topical information for insincere question classification. We evaluated our proposed framework on a large real-world dataset from Quora Q&A site and showed that the topically enhanced word embedding is able to achieve better results in toxic content classification. An empirical study was also conducted to analyze the topics of the insincere questions on Quora, and we found that topics on “religion”, “gender” and ‘'politics'’ has a higher proportion of insincere questions.","PeriodicalId":403360,"journal":{"name":"2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Topic Enhanced Word Embedding for Toxic Content Detection in Q&A Sites\",\"authors\":\"Do Yeon Kim, Xiaohan Li, Sheng Wang, Yunying Zhuo, R. Lee\",\"doi\":\"10.1145/3341161.3345332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Increasingly, users are adopting community question-and-answer (Q&A) sites to exchange information. Detecting and eliminating toxic and divisive content in these Q&A sites are paramount tasks to ensure a safe and constructive environment for the users. Insincere question, which is founded upon false premises, is one type of toxic content in Q&A sites. In this paper, we proposed a novel deep learning framework enhanced pre-trained word embeddings with topical information for insincere question classification. We evaluated our proposed framework on a large real-world dataset from Quora Q&A site and showed that the topically enhanced word embedding is able to achieve better results in toxic content classification. An empirical study was also conducted to analyze the topics of the insincere questions on Quora, and we found that topics on “religion”, “gender” and ‘'politics'’ has a higher proportion of insincere questions.\",\"PeriodicalId\":403360,\"journal\":{\"name\":\"2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3341161.3345332\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3341161.3345332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Topic Enhanced Word Embedding for Toxic Content Detection in Q&A Sites
Increasingly, users are adopting community question-and-answer (Q&A) sites to exchange information. Detecting and eliminating toxic and divisive content in these Q&A sites are paramount tasks to ensure a safe and constructive environment for the users. Insincere question, which is founded upon false premises, is one type of toxic content in Q&A sites. In this paper, we proposed a novel deep learning framework enhanced pre-trained word embeddings with topical information for insincere question classification. We evaluated our proposed framework on a large real-world dataset from Quora Q&A site and showed that the topically enhanced word embedding is able to achieve better results in toxic content classification. An empirical study was also conducted to analyze the topics of the insincere questions on Quora, and we found that topics on “religion”, “gender” and ‘'politics'’ has a higher proportion of insincere questions.