Fine-Grained Analysis of Cyberbullying Using Weakly-Supervised Topic Models

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI:10.1109/DSAA.2018.00065

Yue Zhang, Arti Ramesh

{"title":"Fine-Grained Analysis of Cyberbullying Using Weakly-Supervised Topic Models","authors":"Yue Zhang, Arti Ramesh","doi":"10.1109/DSAA.2018.00065","DOIUrl":null,"url":null,"abstract":"The possibility of anonymity and lack of effective ways to identify inappropriate messages have resulted in a significant amount of online interaction data that attempt to harass, bully, or offend the recipient. In this work, we perform a fine-grained quantitative and qualitative linguistic analysis of messages exchanged using one such recent web/smartphone application—Sarahah, that allows friends to exchange messages anonymously. We first develop a weakly supervised hierarchical framework using seeded topic models to automatically categorize Sarahah messages into different coarse and fine-grained bullying categories. Our linguistic analysis reveals that a significant number of messages exchanged using this platform (~ 20%) include inappropriate, hurtful, or profane language intended to embarrass, offend, or bully the recipient. We then present a detailed analysis of the messages and corresponding users' responses to these messages in the different bullying categories by comparing them across different linguistic and psychological attributes such as sentiment and psycho-linguistic categories from Linguistic Inquiry Word Count (LIWC). Finally, we perform a comparative analysis of messages exchanged on Sarahah to an existing labeled cyberbullying dataset from the Formspring social network on the severity of bullying, coarse-grained bullying categories, and anonymity. Our analysis sheds light on the different categories of bullying and the effect each category has on the recipient and helps quantify the different types and amounts of negativity existing in online social media.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2018.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

The possibility of anonymity and lack of effective ways to identify inappropriate messages have resulted in a significant amount of online interaction data that attempt to harass, bully, or offend the recipient. In this work, we perform a fine-grained quantitative and qualitative linguistic analysis of messages exchanged using one such recent web/smartphone application—Sarahah, that allows friends to exchange messages anonymously. We first develop a weakly supervised hierarchical framework using seeded topic models to automatically categorize Sarahah messages into different coarse and fine-grained bullying categories. Our linguistic analysis reveals that a significant number of messages exchanged using this platform (~ 20%) include inappropriate, hurtful, or profane language intended to embarrass, offend, or bully the recipient. We then present a detailed analysis of the messages and corresponding users' responses to these messages in the different bullying categories by comparing them across different linguistic and psychological attributes such as sentiment and psycho-linguistic categories from Linguistic Inquiry Word Count (LIWC). Finally, we perform a comparative analysis of messages exchanged on Sarahah to an existing labeled cyberbullying dataset from the Formspring social network on the severity of bullying, coarse-grained bullying categories, and anonymity. Our analysis sheds light on the different categories of bullying and the effect each category has on the recipient and helps quantify the different types and amounts of negativity existing in online social media.

查看原文本刊更多论文

基于弱监督主题模型的网络欺凌细粒度分析

匿名的可能性和缺乏有效的方法来识别不适当的信息导致了大量的在线交互数据，这些数据试图骚扰、欺负或冒犯接收者。在这项工作中，我们对使用最近的一个网络/智能手机应用程序sarahah交换的消息进行了细粒度的定量和定性语言分析，该应用程序允许朋友匿名交换消息。我们首先使用种子主题模型开发了一个弱监督分层框架，将sarah消息自动分类为不同的粗粒度和细粒度欺凌类别。我们的语言分析显示，使用该平台交换的大量信息(约20%)包含不恰当的、伤害性的或亵渎性的语言，旨在使接收者感到尴尬、冒犯或欺负。然后，我们通过比较不同的语言和心理属性，如情感和心理语言类别(LIWC)，详细分析了不同欺凌类别下的信息和相应用户对这些信息的反应。最后，我们对Sarahah上交换的消息与来自Formspring社交网络的现有标记网络欺凌数据集进行了比较分析，分析了欺凌的严重程度、粗粒度欺凌类别和匿名性。我们的分析揭示了不同类别的欺凌以及每种类别对接受者的影响，并有助于量化在线社交媒体中存在的不同类型和数量的消极情绪。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)

自引率

0.00%

发文量