从大型语言模型中提取知识：仇恨和反语音识别的概念瓶颈模型

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-09-09 DOI:10.1016/j.ipm.2025.104309

Roberto Labadie-Tamayo , Djordje Slijepčević , Xihui Chen , Adrian Jaques Böck , Andreas Babic , Liz Freimann , Christiane Atzmüller , Matthias Zeppelzauer

{"title":"从大型语言模型中提取知识：仇恨和反语音识别的概念瓶颈模型","authors":"Roberto Labadie-Tamayo , Djordje Slijepčević , Xihui Chen , Adrian Jaques Böck , Andreas Babic , Liz Freimann , Christiane Atzmüller , Matthias Zeppelzauer","doi":"10.1016/j.ipm.2025.104309","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid increase in hate speech on social media has exposed an unprecedented impact on society, making automated methods for detecting such content important. Unlike prior black-box models, we propose a novel transparent method for automated hate and counter speech recognition, i.e., “Speech Concept Bottleneck Model” (SCBM), using adjectives as human-interpretable bottleneck concepts. SCBM leverages large language models (LLMs) to map input texts to an abstract adjective-based representation, which is then sent to a light-weight classifier for downstream tasks. Across five benchmark datasets spanning multiple languages and platforms (e.g., Twitter, Reddit, YouTube), SCBM achieves an average macro-F1 score of 0.69 which outperforms the most recently reported results from the literature on four out of five datasets. Aside from high recognition accuracy, SCBM provides a high level of both local and global interpretability. Furthermore, fusing our adjective-based concept representation with transformer embeddings, leads to a 1.8% performance increase on average across all datasets, showing that the proposed representation captures complementary information. Our results demonstrate that adjective-based concept representations can serve as compact, interpretable, and effective encodings for hate and counter speech recognition. With adapted adjectives, our method can also be applied to other NLP tasks.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104309"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distilling knowledge from large language models: A concept bottleneck model for hate and counter speech recognition\",\"authors\":\"Roberto Labadie-Tamayo , Djordje Slijepčević , Xihui Chen , Adrian Jaques Böck , Andreas Babic , Liz Freimann , Christiane Atzmüller , Matthias Zeppelzauer\",\"doi\":\"10.1016/j.ipm.2025.104309\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rapid increase in hate speech on social media has exposed an unprecedented impact on society, making automated methods for detecting such content important. Unlike prior black-box models, we propose a novel transparent method for automated hate and counter speech recognition, i.e., “Speech Concept Bottleneck Model” (SCBM), using adjectives as human-interpretable bottleneck concepts. SCBM leverages large language models (LLMs) to map input texts to an abstract adjective-based representation, which is then sent to a light-weight classifier for downstream tasks. Across five benchmark datasets spanning multiple languages and platforms (e.g., Twitter, Reddit, YouTube), SCBM achieves an average macro-F1 score of 0.69 which outperforms the most recently reported results from the literature on four out of five datasets. Aside from high recognition accuracy, SCBM provides a high level of both local and global interpretability. Furthermore, fusing our adjective-based concept representation with transformer embeddings, leads to a 1.8% performance increase on average across all datasets, showing that the proposed representation captures complementary information. Our results demonstrate that adjective-based concept representations can serve as compact, interpretable, and effective encodings for hate and counter speech recognition. With adapted adjectives, our method can also be applied to other NLP tasks.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"63 2\",\"pages\":\"Article 104309\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S030645732500250X\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030645732500250X","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

社交媒体上仇恨言论的迅速增加对社会造成了前所未有的影响，使得检测此类内容的自动化方法变得非常重要。与之前的黑盒模型不同，我们提出了一种新的透明的自动仇恨和反语音识别方法，即“语音概念瓶颈模型”（SCBM），使用形容词作为人类可解释的瓶颈概念。SCBM利用大型语言模型（llm）将输入文本映射到基于抽象形容词的表示，然后将其发送到轻量级分类器以执行下游任务。在跨越多种语言和平台的五个基准数据集（例如，Twitter, Reddit, YouTube）中，SCBM实现了0.69的平均宏观f1分数，优于最近报道的五个数据集中四个的文献结果。除了高识别精度外，SCBM还提供了高水平的局部和全局可解释性。此外，将我们基于形容词的概念表示与变压器嵌入融合在一起，在所有数据集上的性能平均提高了1.8%，这表明所提出的表示捕获了互补信息。我们的研究结果表明，基于形容词的概念表示可以作为仇恨和反语音识别的紧凑、可解释和有效的编码。通过调整形容词，我们的方法也可以应用于其他NLP任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Distilling knowledge from large language models: A concept bottleneck model for hate and counter speech recognition

The rapid increase in hate speech on social media has exposed an unprecedented impact on society, making automated methods for detecting such content important. Unlike prior black-box models, we propose a novel transparent method for automated hate and counter speech recognition, i.e., “Speech Concept Bottleneck Model” (SCBM), using adjectives as human-interpretable bottleneck concepts. SCBM leverages large language models (LLMs) to map input texts to an abstract adjective-based representation, which is then sent to a light-weight classifier for downstream tasks. Across five benchmark datasets spanning multiple languages and platforms (e.g., Twitter, Reddit, YouTube), SCBM achieves an average macro-F1 score of 0.69 which outperforms the most recently reported results from the literature on four out of five datasets. Aside from high recognition accuracy, SCBM provides a high level of both local and global interpretability. Furthermore, fusing our adjective-based concept representation with transformer embeddings, leads to a 1.8% performance increase on average across all datasets, showing that the proposed representation captures complementary information. Our results demonstrate that adjective-based concept representations can serve as compact, interpretable, and effective encodings for hate and counter speech recognition. With adapted adjectives, our method can also be applied to other NLP tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.