仇恨言论类型识别的模糊多任务学习

The World Wide Web Conference Pub Date : 2019-05-13 DOI:10.1145/3308558.3313546

Han Liu, P. Burnap, Wafa Alorainy, M. Williams

{"title":"仇恨言论类型识别的模糊多任务学习","authors":"Han Liu, P. Burnap, Wafa Alorainy, M. Williams","doi":"10.1145/3308558.3313546","DOIUrl":null,"url":null,"abstract":"In traditional machine learning, classifiers training is typically undertaken in the setting of single-task learning, so the trained classifier can discriminate between different classes. However, this must be based on the assumption that different classes are mutually exclusive. In real applications, the above assumption does not always hold. For example, the same book may belong to multiple subjects. From this point of view, researchers were motivated to formulate multi-label learning problems. In this context, each instance can be assigned multiple labels but the classifiers training is still typically undertaken in the setting of single-task learning. When probabilistic approaches are adopted for classifiers training, multi-task learning can be enabled through transformation of a multi-labelled data set into several binary data sets. The above data transformation could usually result in the class imbalance issue. Without the above data transformation, multi-labelling of data results in an exponential increase of the number of classes, leading to fewer instances for each class and a higher difficulty for identifying each class. In addition, multi-labelling of data is very time consuming and expensive in some application areas, such as hate speech detection. In this paper, we introduce a novel formulation of the hate speech type identification problem in the setting of multi-task learning through our proposed fuzzy ensemble approach. In this setting, single-labelled data can be used for semi-supervised multi-label learning and two new metrics (detection rate and irrelevance rate) are thus proposed to measure more effectively the performance for this kind of learning tasks. We report an experimental study on identification of four types of hate speech, namely: religion, race, disability and sexual orientation. The experimental results show that our proposed fuzzy ensemble approach outperforms other popular probabilistic approaches, with an overall detection rate of 0.93.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Fuzzy Multi-task Learning for Hate Speech Type Identification\",\"authors\":\"Han Liu, P. Burnap, Wafa Alorainy, M. Williams\",\"doi\":\"10.1145/3308558.3313546\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In traditional machine learning, classifiers training is typically undertaken in the setting of single-task learning, so the trained classifier can discriminate between different classes. However, this must be based on the assumption that different classes are mutually exclusive. In real applications, the above assumption does not always hold. For example, the same book may belong to multiple subjects. From this point of view, researchers were motivated to formulate multi-label learning problems. In this context, each instance can be assigned multiple labels but the classifiers training is still typically undertaken in the setting of single-task learning. When probabilistic approaches are adopted for classifiers training, multi-task learning can be enabled through transformation of a multi-labelled data set into several binary data sets. The above data transformation could usually result in the class imbalance issue. Without the above data transformation, multi-labelling of data results in an exponential increase of the number of classes, leading to fewer instances for each class and a higher difficulty for identifying each class. In addition, multi-labelling of data is very time consuming and expensive in some application areas, such as hate speech detection. In this paper, we introduce a novel formulation of the hate speech type identification problem in the setting of multi-task learning through our proposed fuzzy ensemble approach. In this setting, single-labelled data can be used for semi-supervised multi-label learning and two new metrics (detection rate and irrelevance rate) are thus proposed to measure more effectively the performance for this kind of learning tasks. We report an experimental study on identification of four types of hate speech, namely: religion, race, disability and sexual orientation. The experimental results show that our proposed fuzzy ensemble approach outperforms other popular probabilistic approaches, with an overall detection rate of 0.93.\",\"PeriodicalId\":23013,\"journal\":{\"name\":\"The World Wide Web Conference\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The World Wide Web Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3308558.3313546\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The World Wide Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

在传统的机器学习中，分类器的训练通常是在单任务学习的环境下进行的，因此训练出来的分类器可以区分不同的类别。然而，这必须基于不同的类是互斥的假设。在实际应用中，上述假设并不总是成立。例如，同一本书可能属于多个主题。从这个角度来看，研究人员被激励去制定多标签学习问题。在这种情况下，每个实例可以分配多个标签，但分类器的训练仍然通常在单任务学习的环境中进行。当采用概率方法进行分类器训练时，可以通过将多标记数据集转换为多个二值数据集来实现多任务学习。上述数据转换通常会导致类不平衡问题。如果不进行上述数据转换，对数据进行多重标记会导致类的数量呈指数增长，导致每个类的实例数量减少，识别每个类的难度增加。此外，在一些应用领域，如仇恨语音检测中，数据的多重标记非常耗时和昂贵。在本文中，我们通过我们提出的模糊集成方法引入了多任务学习环境下仇恨言论类型识别问题的新公式。在这种情况下，单标签数据可以用于半监督多标签学习，因此提出了两个新的指标(检测率和不相关率)来更有效地衡量这类学习任务的性能。我们报告了一项关于识别四种类型的仇恨言论的实验研究，即:宗教，种族，残疾和性取向。实验结果表明，本文提出的模糊集成方法优于其他常用的概率方法，总体检测率为0.93。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fuzzy Multi-task Learning for Hate Speech Type Identification

In traditional machine learning, classifiers training is typically undertaken in the setting of single-task learning, so the trained classifier can discriminate between different classes. However, this must be based on the assumption that different classes are mutually exclusive. In real applications, the above assumption does not always hold. For example, the same book may belong to multiple subjects. From this point of view, researchers were motivated to formulate multi-label learning problems. In this context, each instance can be assigned multiple labels but the classifiers training is still typically undertaken in the setting of single-task learning. When probabilistic approaches are adopted for classifiers training, multi-task learning can be enabled through transformation of a multi-labelled data set into several binary data sets. The above data transformation could usually result in the class imbalance issue. Without the above data transformation, multi-labelling of data results in an exponential increase of the number of classes, leading to fewer instances for each class and a higher difficulty for identifying each class. In addition, multi-labelling of data is very time consuming and expensive in some application areas, such as hate speech detection. In this paper, we introduce a novel formulation of the hate speech type identification problem in the setting of multi-task learning through our proposed fuzzy ensemble approach. In this setting, single-labelled data can be used for semi-supervised multi-label learning and two new metrics (detection rate and irrelevance rate) are thus proposed to measure more effectively the performance for this kind of learning tasks. We report an experimental study on identification of four types of hate speech, namely: religion, race, disability and sexual orientation. The experimental results show that our proposed fuzzy ensemble approach outperforms other popular probabilistic approaches, with an overall detection rate of 0.93.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The World Wide Web Conference

自引率

0.00%

发文量