A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules.

IF 3.7 1区 数学 Q1 STATISTICS & PROBABILITY
Annals of Statistics Pub Date : 2025-02-01 Epub Date: 2025-02-13 DOI:10.1214/24-aos2468
Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J Su
{"title":"A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules.","authors":"Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J Su","doi":"10.1214/24-aos2468","DOIUrl":null,"url":null,"abstract":"<p><p>Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key-provided by the LLM to the verifier-to control the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks-one of which has been internally implemented at OpenAI-and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"53 1","pages":"322-351"},"PeriodicalIF":3.7000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12467635/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/24-aos2468","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key-provided by the LLM to the verifier-to control the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks-one of which has been internally implemented at OpenAI-and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments.

大型语言模型的水印统计框架:支点、检测效率和最优规则。
自ChatGPT于2022年11月推出以来,将(几乎)不明显的统计信号嵌入到大型语言模型(llm)生成的文本中,也称为水印,已被用作一种原则方法,用于从人类编写的对等文本中检测llm生成的文本。本文介绍了一种通用的、灵活的框架,用于推理水印的统计效率和设计强大的检测规则。受水印检测的假设检验公式的启发,我们的框架首先选择文本的关键统计量和由LLM提供给验证者的密钥,以控制误报率(错误地将人类编写的文本检测为LLM生成的错误)。接下来,该框架允许人们通过获得渐近假阴性率(错误地将llm生成的文本分类为人类编写的错误)的封闭形式表达式来评估水印检测规则的能力。我们的框架进一步将确定最优检测规则的问题简化为求解极大极小优化方案。我们将这个框架应用于两个代表性的水印(其中一个已经在openai内部实现),并获得了一些有助于指导实现水印实践的发现。特别地,我们在我们的框架下推导出这些水印的最优检测规则。这些理论推导的检测规则通过数值实验证明是有竞争力的,有时比现有的检测方法具有更高的功率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Statistics
Annals of Statistics 数学-统计学与概率论
CiteScore
9.30
自引率
8.90%
发文量
119
审稿时长
6-12 weeks
期刊介绍: The Annals of Statistics aim to publish research papers of highest quality reflecting the many facets of contemporary statistics. Primary emphasis is placed on importance and originality, not on formalism. The journal aims to cover all areas of statistics, especially mathematical statistics and applied & interdisciplinary statistics. Of course many of the best papers will touch on more than one of these general areas, because the discipline of statistics has deep roots in mathematics, and in substantive scientific fields.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信