从2016年菲律宾选举相关推文中确定仇恨言论的语言特征

2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD) Pub Date : 2023-03-08 DOI:10.1109/ITIKD56332.2023.10100008

Raphael Christen K. Enriquez, M. R. Estuar

{"title":"从2016年菲律宾选举相关推文中确定仇恨言论的语言特征","authors":"Raphael Christen K. Enriquez, M. R. Estuar","doi":"10.1109/ITIKD56332.2023.10100008","DOIUrl":null,"url":null,"abstract":"Hate speech is characterized as a deliberate attack directed towards a group of people motivated by aspects of the group, s identity. There is a growing interest in solutions involving automatic hate speech detection in response to the proliferation of hate speech. However., most automatic hate speech detection tools are designed for high-resource languages such as English which results in challenges in detecting hate speech in low-resource languages such as Filipino. Social media users within the Philippines predominantly use native language or a code-switched variation such as Taglish as the preferred linguistic style in online communication. This study seeks to determine linguistic features that characterize hate speech in the Philippine setting. The study characterizes hate speech using the following features: bilingual., part-of-speech., and psycho-linguistic features. Feature extraction was facilitated via fastText., NLTK (Natural Language Toolkit)., and LIWC (Linguistic Inquiry and Word Count) from an existing Filipino hate speech corpus collected during the 2016 Philippine Presidential Elections. Results show that hate speech from this dataset has significantly different features from non-hate speech. Specifically., the distinct features include language dominance., frequency of code-switching., frequency of parts-of-speech., and LIWC's summary variables and psychological process. These features which have been demonstrated to be statistically different between hate speech and non-hate speech can be leveraged to augment existing hate speech detection models., particularly within low-resource language contexts.","PeriodicalId":283631,"journal":{"name":"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)","volume":"23 11","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Determining Linguistic Features of Hate Speech from 2016 Philippine Election-Related Tweets\",\"authors\":\"Raphael Christen K. Enriquez, M. R. Estuar\",\"doi\":\"10.1109/ITIKD56332.2023.10100008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hate speech is characterized as a deliberate attack directed towards a group of people motivated by aspects of the group, s identity. There is a growing interest in solutions involving automatic hate speech detection in response to the proliferation of hate speech. However., most automatic hate speech detection tools are designed for high-resource languages such as English which results in challenges in detecting hate speech in low-resource languages such as Filipino. Social media users within the Philippines predominantly use native language or a code-switched variation such as Taglish as the preferred linguistic style in online communication. This study seeks to determine linguistic features that characterize hate speech in the Philippine setting. The study characterizes hate speech using the following features: bilingual., part-of-speech., and psycho-linguistic features. Feature extraction was facilitated via fastText., NLTK (Natural Language Toolkit)., and LIWC (Linguistic Inquiry and Word Count) from an existing Filipino hate speech corpus collected during the 2016 Philippine Presidential Elections. Results show that hate speech from this dataset has significantly different features from non-hate speech. Specifically., the distinct features include language dominance., frequency of code-switching., frequency of parts-of-speech., and LIWC's summary variables and psychological process. These features which have been demonstrated to be statistically different between hate speech and non-hate speech can be leveraged to augment existing hate speech detection models., particularly within low-resource language contexts.\",\"PeriodicalId\":283631,\"journal\":{\"name\":\"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)\",\"volume\":\"23 11\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITIKD56332.2023.10100008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITIKD56332.2023.10100008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

仇恨言论的特点是针对一群人的蓄意攻击，动机是该群体的某些方面的身份。为了应对仇恨言论的扩散，人们对涉及仇恨言论自动检测的解决方案越来越感兴趣。然而。在美国，大多数自动仇恨言论检测工具都是针对英语等资源丰富的语言设计的，因此在检测菲律宾语等资源匮乏语言的仇恨言论方面存在挑战。菲律宾的社交媒体用户主要使用母语或代码转换的变体，如Taglish，作为在线交流的首选语言风格。本研究旨在确定菲律宾环境中仇恨言论的语言特征。该研究使用以下特征来描述仇恨言论:双语。,词性。心理语言特征。通过fastText实现特征提取。， NLTK(自然语言工具包)。以及2016年菲律宾总统选举期间收集的现有菲律宾仇恨言论语料库中的LIWC(语言调查和字数统计)。结果表明，该数据集中的仇恨言论与非仇恨言论具有显著不同的特征。具体来说。，其显著特征包括语言优势。，代码转换频率。，词类的频率。，以及LIWC的总结变量和心理过程。这些特征已经被证明在仇恨言论和非仇恨言论之间具有统计差异，可以用来增强现有的仇恨言论检测模型。特别是在资源匮乏的语言环境中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Determining Linguistic Features of Hate Speech from 2016 Philippine Election-Related Tweets

Hate speech is characterized as a deliberate attack directed towards a group of people motivated by aspects of the group, s identity. There is a growing interest in solutions involving automatic hate speech detection in response to the proliferation of hate speech. However., most automatic hate speech detection tools are designed for high-resource languages such as English which results in challenges in detecting hate speech in low-resource languages such as Filipino. Social media users within the Philippines predominantly use native language or a code-switched variation such as Taglish as the preferred linguistic style in online communication. This study seeks to determine linguistic features that characterize hate speech in the Philippine setting. The study characterizes hate speech using the following features: bilingual., part-of-speech., and psycho-linguistic features. Feature extraction was facilitated via fastText., NLTK (Natural Language Toolkit)., and LIWC (Linguistic Inquiry and Word Count) from an existing Filipino hate speech corpus collected during the 2016 Philippine Presidential Elections. Results show that hate speech from this dataset has significantly different features from non-hate speech. Specifically., the distinct features include language dominance., frequency of code-switching., frequency of parts-of-speech., and LIWC's summary variables and psychological process. These features which have been demonstrated to be statistically different between hate speech and non-hate speech can be leveraged to augment existing hate speech detection models., particularly within low-resource language contexts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)

自引率

0.00%

发文量