基于顺序不平衡文本分类的泰国社交媒体仇恨语音检测

2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE) Pub Date : 2022-06-22 DOI:10.1109/jcsse54890.2022.9836312

Kitsuchart Pasupa, Werasut Karnbanjob, Massakorn Aksornsiri

{"title":"基于顺序不平衡文本分类的泰国社交媒体仇恨语音检测","authors":"Kitsuchart Pasupa, Werasut Karnbanjob, Massakorn Aksornsiri","doi":"10.1109/jcsse54890.2022.9836312","DOIUrl":null,"url":null,"abstract":"Cyberbullying has become a serious problem in Thai social media. For example, some Thai people posted hate speeches on Myanmar workers in Thailand during the COVID-19 pandemic, which might elevate hate crime. It is imperative and urgent to detect cyberbullying on Thai social media. The task is a text classification problem. Moreover, hate speeches contain the order of severity levels, but many pieces of work did not consider this point in the model. Therefore, we developed a Thai hate-speech classification method with various loss functions to detect such hate speeches accurately. We evaluated them on a corpus of ordinal-imbalanced Thai text. The evaluated outcomes indicated that the best-in terms of $F$1 -score-model was the model with a loss function of a hybrid between an Ordinal regression loss function and Pearson correlation coefficients (common in similarity function). It yielded an average F1-score of 78.38 %-0.88 % significantly higher than the score achieved by a conventional loss function-and an average mean squared error of 0.2478-5.49 % relative improvement. Thus, the proposed hybrid loss function improved the efficiency of the model.","PeriodicalId":284735,"journal":{"name":"2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Hate Speech Detection in Thai Social Media with Ordinal-Imbalanced Text Classification\",\"authors\":\"Kitsuchart Pasupa, Werasut Karnbanjob, Massakorn Aksornsiri\",\"doi\":\"10.1109/jcsse54890.2022.9836312\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cyberbullying has become a serious problem in Thai social media. For example, some Thai people posted hate speeches on Myanmar workers in Thailand during the COVID-19 pandemic, which might elevate hate crime. It is imperative and urgent to detect cyberbullying on Thai social media. The task is a text classification problem. Moreover, hate speeches contain the order of severity levels, but many pieces of work did not consider this point in the model. Therefore, we developed a Thai hate-speech classification method with various loss functions to detect such hate speeches accurately. We evaluated them on a corpus of ordinal-imbalanced Thai text. The evaluated outcomes indicated that the best-in terms of $F$1 -score-model was the model with a loss function of a hybrid between an Ordinal regression loss function and Pearson correlation coefficients (common in similarity function). It yielded an average F1-score of 78.38 %-0.88 % significantly higher than the score achieved by a conventional loss function-and an average mean squared error of 0.2478-5.49 % relative improvement. Thus, the proposed hybrid loss function improved the efficiency of the model.\",\"PeriodicalId\":284735,\"journal\":{\"name\":\"2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/jcsse54890.2022.9836312\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/jcsse54890.2022.9836312","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

网络欺凌已经成为泰国社交媒体的一个严重问题。例如，在新冠疫情期间，一些泰国人在泰国发表了针对缅甸工人的仇恨言论，这可能会加剧仇恨犯罪。在泰国社交媒体上发现网络欺凌是当务之急。这个任务是一个文本分类问题。此外，仇恨言论包含严重程度的顺序，但许多工作在模型中没有考虑到这一点。因此，我们开发了一种具有各种损失函数的泰语仇恨言论分类方法来准确地检测这类仇恨言论。我们在一个顺序不平衡的泰语文本语料库上对它们进行了评估。评估结果表明，就F$1评分模型而言，最好的模型是具有有序回归损失函数和Pearson相关系数(常见的相似函数)之间的混合损失函数的模型。它产生的平均f1分数为78.38% - 0.88%，显著高于传统损失函数获得的分数，平均均方误差为0.2478- 5.49%。因此，所提出的混合损失函数提高了模型的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hate Speech Detection in Thai Social Media with Ordinal-Imbalanced Text Classification

Cyberbullying has become a serious problem in Thai social media. For example, some Thai people posted hate speeches on Myanmar workers in Thailand during the COVID-19 pandemic, which might elevate hate crime. It is imperative and urgent to detect cyberbullying on Thai social media. The task is a text classification problem. Moreover, hate speeches contain the order of severity levels, but many pieces of work did not consider this point in the model. Therefore, we developed a Thai hate-speech classification method with various loss functions to detect such hate speeches accurately. We evaluated them on a corpus of ordinal-imbalanced Thai text. The evaluated outcomes indicated that the best-in terms of $F$1 -score-model was the model with a loss function of a hybrid between an Ordinal regression loss function and Pearson correlation coefficients (common in similarity function). It yielded an average F1-score of 78.38 %-0.88 % significantly higher than the score achieved by a conventional loss function-and an average mean squared error of 0.2478-5.49 % relative improvement. Thus, the proposed hybrid loss function improved the efficiency of the model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE)

自引率

0.00%

发文量