A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules.

IF 3.7 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics Pub Date : 2025-02-01 Epub Date: 2025-02-13 DOI:10.1214/24-aos2468

Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J Su

{"title":"A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules.","authors":"Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J Su","doi":"10.1214/24-aos2468","DOIUrl":null,"url":null,"abstract":"<p><p>Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key-provided by the LLM to the verifier-to control the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks-one of which has been internally implemented at OpenAI-and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"53 1","pages":"322-351"},"PeriodicalIF":3.7000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12467635/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/24-aos2468","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key-provided by the LLM to the verifier-to control the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks-one of which has been internally implemented at OpenAI-and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments.

查看原文本刊更多论文

大型语言模型的水印统计框架：支点、检测效率和最优规则。

自ChatGPT于2022年11月推出以来，将（几乎）不明显的统计信号嵌入到大型语言模型（llm）生成的文本中，也称为水印，已被用作一种原则方法，用于从人类编写的对等文本中检测llm生成的文本。本文介绍了一种通用的、灵活的框架，用于推理水印的统计效率和设计强大的检测规则。受水印检测的假设检验公式的启发，我们的框架首先选择文本的关键统计量和由LLM提供给验证者的密钥，以控制误报率（错误地将人类编写的文本检测为LLM生成的错误）。接下来，该框架允许人们通过获得渐近假阴性率（错误地将llm生成的文本分类为人类编写的错误）的封闭形式表达式来评估水印检测规则的能力。我们的框架进一步将确定最优检测规则的问题简化为求解极大极小优化方案。我们将这个框架应用于两个代表性的水印（其中一个已经在openai内部实现），并获得了一些有助于指导实现水印实践的发现。特别地，我们在我们的框架下推导出这些水印的最优检测规则。这些理论推导的检测规则通过数值实验证明是有竞争力的，有时比现有的检测方法具有更高的功率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annals of Statistics 数学-统计学与概率论

CiteScore

9.30

自引率

8.90%

发文量

119

审稿时长

6-12 weeks

期刊介绍： The Annals of Statistics aim to publish research papers of highest quality reflecting the many facets of contemporary statistics. Primary emphasis is placed on importance and originality, not on formalism. The journal aims to cover all areas of statistics, especially mathematical statistics and applied & interdisciplinary statistics. Of course many of the best papers will touch on more than one of these general areas, because the discipline of statistics has deep roots in mathematics, and in substantive scientific fields.