{"title":"DDML: Multi-Student Knowledge Distillation for Hate Speech.","authors":"Ze Liu, Zerui Shao, Haizhou Wang, Beibei Li","doi":"10.3390/e27040417","DOIUrl":null,"url":null,"abstract":"<p><p>Recent studies have shown that hate speech on social media negatively impacts users' mental health and is a contributing factor to suicide attempts. On a broader scale, online hate speech can undermine social stability. With the continuous growth of the internet, the prevalence of online hate speech is rising, making its detection an urgent issue. Recent advances in natural language processing, particularly with transformer-based models, have shown significant promise in hate speech detection. However, these models come with a large number of parameters, leading to high computational requirements and making them difficult to deploy on personal computers. To address these challenges, knowledge distillation offers a solution by training smaller student networks using larger teacher networks. Recognizing that learning also occurs through peer interactions, we propose a knowledge distillation method called Deep Distill-Mutual Learning (DDML). DDML employs one teacher network and two or more student networks. While the student networks benefit from the teacher's knowledge, they also engage in mutual learning with each other. We trained numerous deep neural networks for hate speech detection based on DDML and demonstrated that these networks perform well across various datasets. We tested our method across ten languages and nine datasets. The results demonstrate that DDML enhances the performance of deep neural networks, achieving an average F1 score increase of 4.87% over the baseline.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 4","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12025758/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27040417","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Recent studies have shown that hate speech on social media negatively impacts users' mental health and is a contributing factor to suicide attempts. On a broader scale, online hate speech can undermine social stability. With the continuous growth of the internet, the prevalence of online hate speech is rising, making its detection an urgent issue. Recent advances in natural language processing, particularly with transformer-based models, have shown significant promise in hate speech detection. However, these models come with a large number of parameters, leading to high computational requirements and making them difficult to deploy on personal computers. To address these challenges, knowledge distillation offers a solution by training smaller student networks using larger teacher networks. Recognizing that learning also occurs through peer interactions, we propose a knowledge distillation method called Deep Distill-Mutual Learning (DDML). DDML employs one teacher network and two or more student networks. While the student networks benefit from the teacher's knowledge, they also engage in mutual learning with each other. We trained numerous deep neural networks for hate speech detection based on DDML and demonstrated that these networks perform well across various datasets. We tested our method across ten languages and nine datasets. The results demonstrate that DDML enhances the performance of deep neural networks, achieving an average F1 score increase of 4.87% over the baseline.
期刊介绍:
Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.