从2016年菲律宾选举相关推文中确定仇恨言论的语言特征

Raphael Christen K. Enriquez, M. R. Estuar
{"title":"从2016年菲律宾选举相关推文中确定仇恨言论的语言特征","authors":"Raphael Christen K. Enriquez, M. R. Estuar","doi":"10.1109/ITIKD56332.2023.10100008","DOIUrl":null,"url":null,"abstract":"Hate speech is characterized as a deliberate attack directed towards a group of people motivated by aspects of the group, s identity. There is a growing interest in solutions involving automatic hate speech detection in response to the proliferation of hate speech. However., most automatic hate speech detection tools are designed for high-resource languages such as English which results in challenges in detecting hate speech in low-resource languages such as Filipino. Social media users within the Philippines predominantly use native language or a code-switched variation such as Taglish as the preferred linguistic style in online communication. This study seeks to determine linguistic features that characterize hate speech in the Philippine setting. The study characterizes hate speech using the following features: bilingual., part-of-speech., and psycho-linguistic features. Feature extraction was facilitated via fastText., NLTK (Natural Language Toolkit)., and LIWC (Linguistic Inquiry and Word Count) from an existing Filipino hate speech corpus collected during the 2016 Philippine Presidential Elections. Results show that hate speech from this dataset has significantly different features from non-hate speech. Specifically., the distinct features include language dominance., frequency of code-switching., frequency of parts-of-speech., and LIWC's summary variables and psychological process. These features which have been demonstrated to be statistically different between hate speech and non-hate speech can be leveraged to augment existing hate speech detection models., particularly within low-resource language contexts.","PeriodicalId":283631,"journal":{"name":"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)","volume":"23 11","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Determining Linguistic Features of Hate Speech from 2016 Philippine Election-Related Tweets\",\"authors\":\"Raphael Christen K. Enriquez, M. R. Estuar\",\"doi\":\"10.1109/ITIKD56332.2023.10100008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hate speech is characterized as a deliberate attack directed towards a group of people motivated by aspects of the group, s identity. There is a growing interest in solutions involving automatic hate speech detection in response to the proliferation of hate speech. However., most automatic hate speech detection tools are designed for high-resource languages such as English which results in challenges in detecting hate speech in low-resource languages such as Filipino. Social media users within the Philippines predominantly use native language or a code-switched variation such as Taglish as the preferred linguistic style in online communication. This study seeks to determine linguistic features that characterize hate speech in the Philippine setting. The study characterizes hate speech using the following features: bilingual., part-of-speech., and psycho-linguistic features. Feature extraction was facilitated via fastText., NLTK (Natural Language Toolkit)., and LIWC (Linguistic Inquiry and Word Count) from an existing Filipino hate speech corpus collected during the 2016 Philippine Presidential Elections. Results show that hate speech from this dataset has significantly different features from non-hate speech. Specifically., the distinct features include language dominance., frequency of code-switching., frequency of parts-of-speech., and LIWC's summary variables and psychological process. These features which have been demonstrated to be statistically different between hate speech and non-hate speech can be leveraged to augment existing hate speech detection models., particularly within low-resource language contexts.\",\"PeriodicalId\":283631,\"journal\":{\"name\":\"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)\",\"volume\":\"23 11\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITIKD56332.2023.10100008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITIKD56332.2023.10100008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

仇恨言论的特点是针对一群人的蓄意攻击,动机是该群体的某些方面的身份。为了应对仇恨言论的扩散,人们对涉及仇恨言论自动检测的解决方案越来越感兴趣。然而。在美国,大多数自动仇恨言论检测工具都是针对英语等资源丰富的语言设计的,因此在检测菲律宾语等资源匮乏语言的仇恨言论方面存在挑战。菲律宾的社交媒体用户主要使用母语或代码转换的变体,如Taglish,作为在线交流的首选语言风格。本研究旨在确定菲律宾环境中仇恨言论的语言特征。该研究使用以下特征来描述仇恨言论:双语。,词性。心理语言特征。通过fastText实现特征提取。, NLTK(自然语言工具包)。以及2016年菲律宾总统选举期间收集的现有菲律宾仇恨言论语料库中的LIWC(语言调查和字数统计)。结果表明,该数据集中的仇恨言论与非仇恨言论具有显著不同的特征。具体来说。,其显著特征包括语言优势。,代码转换频率。,词类的频率。,以及LIWC的总结变量和心理过程。这些特征已经被证明在仇恨言论和非仇恨言论之间具有统计差异,可以用来增强现有的仇恨言论检测模型。特别是在资源匮乏的语言环境中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Determining Linguistic Features of Hate Speech from 2016 Philippine Election-Related Tweets
Hate speech is characterized as a deliberate attack directed towards a group of people motivated by aspects of the group, s identity. There is a growing interest in solutions involving automatic hate speech detection in response to the proliferation of hate speech. However., most automatic hate speech detection tools are designed for high-resource languages such as English which results in challenges in detecting hate speech in low-resource languages such as Filipino. Social media users within the Philippines predominantly use native language or a code-switched variation such as Taglish as the preferred linguistic style in online communication. This study seeks to determine linguistic features that characterize hate speech in the Philippine setting. The study characterizes hate speech using the following features: bilingual., part-of-speech., and psycho-linguistic features. Feature extraction was facilitated via fastText., NLTK (Natural Language Toolkit)., and LIWC (Linguistic Inquiry and Word Count) from an existing Filipino hate speech corpus collected during the 2016 Philippine Presidential Elections. Results show that hate speech from this dataset has significantly different features from non-hate speech. Specifically., the distinct features include language dominance., frequency of code-switching., frequency of parts-of-speech., and LIWC's summary variables and psychological process. These features which have been demonstrated to be statistically different between hate speech and non-hate speech can be leveraged to augment existing hate speech detection models., particularly within low-resource language contexts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信