Content moderation by LLM: from accuracy to legitimacy

IF 13.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2025-07-19 DOI:10.1007/s10462-025-11328-1

Tao Huang

{"title":"Content moderation by LLM: from accuracy to legitimacy","authors":"Tao Huang","doi":"10.1007/s10462-025-11328-1","DOIUrl":null,"url":null,"abstract":"<div><p>One trending application of LLM (large language model) is to use it for content moderation in online platforms. Most current studies on this application have focused on the metric of <i>accuracy</i>—the extent to which LLMs make correct decisions about content. This article argues that accuracy is insufficient and misleading because it fails to grasp the distinction between easy cases and hard cases, as well as the inevitable trade-offs in achieving higher accuracy. Closer examination reveals that content moderation is a constitutive part of platform governance, the key to which is to gain and enhance <i>legitimacy</i>. Instead of making moderation decisions correctly, the chief goal of LLMs is to make them legitimate. In this regard, this article proposes a paradigm shift from the single benchmark of accuracy towards a legitimacy-based framework for evaluating the performance of LLM moderators. The framework suggests that for easy cases, the key is to ensure accuracy, speed, and transparency, while for hard cases, what matters is reasoned justification and user participation. Examined under this framework, LLMs’ real potential in moderation is not accuracy improvement. Rather, LLMs can better contribute in four other aspects: to conduct screening of hard cases from easy cases, to provide quality explanations for moderation decisions, to assist human reviewers in getting more contextual information, and to facilitate user participation in a more interactive way. To realize these contributions, this article proposes a workflow for incorporating LLMs into the content moderation system. Using normative theories from law and social sciences to critically assess the new technological application, this article seeks to redefine LLMs’ role in content moderation and redirect relevant research in this field.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 10","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11328-1.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11328-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

One trending application of LLM (large language model) is to use it for content moderation in online platforms. Most current studies on this application have focused on the metric of accuracy—the extent to which LLMs make correct decisions about content. This article argues that accuracy is insufficient and misleading because it fails to grasp the distinction between easy cases and hard cases, as well as the inevitable trade-offs in achieving higher accuracy. Closer examination reveals that content moderation is a constitutive part of platform governance, the key to which is to gain and enhance legitimacy. Instead of making moderation decisions correctly, the chief goal of LLMs is to make them legitimate. In this regard, this article proposes a paradigm shift from the single benchmark of accuracy towards a legitimacy-based framework for evaluating the performance of LLM moderators. The framework suggests that for easy cases, the key is to ensure accuracy, speed, and transparency, while for hard cases, what matters is reasoned justification and user participation. Examined under this framework, LLMs’ real potential in moderation is not accuracy improvement. Rather, LLMs can better contribute in four other aspects: to conduct screening of hard cases from easy cases, to provide quality explanations for moderation decisions, to assist human reviewers in getting more contextual information, and to facilitate user participation in a more interactive way. To realize these contributions, this article proposes a workflow for incorporating LLMs into the content moderation system. Using normative theories from law and social sciences to critically assess the new technological application, this article seeks to redefine LLMs’ role in content moderation and redirect relevant research in this field.

查看原文本刊更多论文

法学硕士的内容审核：从准确性到合法性

LLM（大型语言模型）的一个趋势应用是将其用于在线平台的内容审核。目前关于该应用程序的大多数研究都集中在准确性度量上——法学硕士对内容做出正确决策的程度。本文认为，准确性是不够的和误导性的，因为它没有把握简单案例和困难案例之间的区别，以及实现更高准确性的不可避免的权衡。更仔细的研究表明，内容审核是平台治理的一个组成部分，其关键是获得和增强合法性。法学硕士的主要目标是使其合法化，而不是正确地做出适度的决定。在这方面，本文提出了一种范式转变，从单一的准确性基准转向基于合法性的框架，以评估法学硕士主持人的绩效。该框架建议，对于简单的案例，关键是确保准确性、速度和透明度，而对于困难的案例，重要的是合理的论证和用户参与。在这个框架下，法学硕士的真正潜力并不是准确性的提高。相反，法学硕士可以在其他四个方面做出更好的贡献：从简单案例中筛选困难案例，为审核决策提供高质量的解释，协助人工审核人员获得更多的上下文信息，并以更具互动性的方式促进用户参与。为了实现这些贡献，本文提出了一个将法学硕士纳入内容审核系统的工作流程。本文利用法律和社会科学的规范理论来批判性地评估新技术的应用，试图重新定义法学硕士在内容审核中的作用，并重新定位该领域的相关研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.