Gang Zhou , Haizhou Wang , Di Jin , Wenxian Wang , Shuyu Jiang , Rui Tang , Xingshu Chen
{"title":"A Toxic Euphemism Detection framework for online social network based on Semantic Contrastive Learning and dual channel knowledge augmentation","authors":"Gang Zhou , Haizhou Wang , Di Jin , Wenxian Wang , Shuyu Jiang , Rui Tang , Xingshu Chen","doi":"10.1016/j.ipm.2025.104143","DOIUrl":null,"url":null,"abstract":"<div><div>For real-time content moderation systems, detecting toxic euphemisms remains a significant challenge due to the lack of available annotated datasets and the ability to deeply identify euphemistic toxicity. In this paper, we proposed the TED-SCL framework (Toxic Euphemism Detection based on Semantic Contrastive Learning) to solve these problems. Firstly, we collected nearly 8 million comments and constructed a toxic euphemism dataset (TE-Dataset), which contains 18,971 comments, covering six topics and 424 PTETs (Potential Toxic Euphemism Terms). Next, we employed contrastive learning to separate toxic euphemism samples from harmless ones in semantic space and enhance the model’s ability to capture subtle differences. Lastly, we utilized a dual channel knowledge augmentation module to integrate background knowledge with toxic comments and improve the identification of toxic euphemisms. Experimental results demonstrate that TED-SCL outperforms existing SOTA in toxic euphemism detection tasks, achieving accuracy of 93.94%, recall of 93.36%, and F1 score of 93.23%. Furthermore, TED-SCL demonstrates better generalization, zero-shot capability, and greater robustness on different topics and datasets, which provides a new way for real-time content moderation systems to detect euphemistic and implicit toxicity effectively.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 4","pages":"Article 104143"},"PeriodicalIF":7.4000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325000846","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
For real-time content moderation systems, detecting toxic euphemisms remains a significant challenge due to the lack of available annotated datasets and the ability to deeply identify euphemistic toxicity. In this paper, we proposed the TED-SCL framework (Toxic Euphemism Detection based on Semantic Contrastive Learning) to solve these problems. Firstly, we collected nearly 8 million comments and constructed a toxic euphemism dataset (TE-Dataset), which contains 18,971 comments, covering six topics and 424 PTETs (Potential Toxic Euphemism Terms). Next, we employed contrastive learning to separate toxic euphemism samples from harmless ones in semantic space and enhance the model’s ability to capture subtle differences. Lastly, we utilized a dual channel knowledge augmentation module to integrate background knowledge with toxic comments and improve the identification of toxic euphemisms. Experimental results demonstrate that TED-SCL outperforms existing SOTA in toxic euphemism detection tasks, achieving accuracy of 93.94%, recall of 93.36%, and F1 score of 93.23%. Furthermore, TED-SCL demonstrates better generalization, zero-shot capability, and greater robustness on different topics and datasets, which provides a new way for real-time content moderation systems to detect euphemistic and implicit toxicity effectively.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.