{"title":"WEDA:探索大型语言模型下游对齐的版权保护","authors":"Shen Wang;Jialiang Dong;Longfei Wu;Zhitao Guan","doi":"10.1109/TASLP.2024.3487419","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) have shown incomparable representation and generalization capabilities, which have led to significant advancements in Natural Language Processing (NLP). Before deployment, the pre-trained LLMs often need to be tailored to specific downstream tasks for improved performance, which is commonly referred to as downstream alignment. This is a costly effort considering the needed manpower, training resources, and downstream-specific data. While much attention has been paid to protecting the copyright of the models themselves, the copyright protection of LLM alignment has been largely overlooked. In this paper, we present Watermark Embedding for Downstream Alignment (WEDA) scheme, which can provide effective copyright protection for two popular LLM alignment techniques parameter-efficient fine-tuning (PEFT) and in-context learning (ICL). For alignment through PEFT, we propose a Chain of Thought (CoT) based solution to embed watermarks into the PEFT weights. Furthermore, we extend this solution to safeguard alignment through ICL by utilizing the prefix-integrated CoT to watermark examples embedded within ICL prompts. We conduct an extensive experimental evaluation to demonstrate the effectiveness of our proposed scheme.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4755-4767"},"PeriodicalIF":4.1000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment\",\"authors\":\"Shen Wang;Jialiang Dong;Longfei Wu;Zhitao Guan\",\"doi\":\"10.1109/TASLP.2024.3487419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large Language Models (LLMs) have shown incomparable representation and generalization capabilities, which have led to significant advancements in Natural Language Processing (NLP). Before deployment, the pre-trained LLMs often need to be tailored to specific downstream tasks for improved performance, which is commonly referred to as downstream alignment. This is a costly effort considering the needed manpower, training resources, and downstream-specific data. While much attention has been paid to protecting the copyright of the models themselves, the copyright protection of LLM alignment has been largely overlooked. In this paper, we present Watermark Embedding for Downstream Alignment (WEDA) scheme, which can provide effective copyright protection for two popular LLM alignment techniques parameter-efficient fine-tuning (PEFT) and in-context learning (ICL). For alignment through PEFT, we propose a Chain of Thought (CoT) based solution to embed watermarks into the PEFT weights. Furthermore, we extend this solution to safeguard alignment through ICL by utilizing the prefix-integrated CoT to watermark examples embedded within ICL prompts. We conduct an extensive experimental evaluation to demonstrate the effectiveness of our proposed scheme.\",\"PeriodicalId\":13332,\"journal\":{\"name\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"volume\":\"32 \",\"pages\":\"4755-4767\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10737446/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10737446/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment
Large Language Models (LLMs) have shown incomparable representation and generalization capabilities, which have led to significant advancements in Natural Language Processing (NLP). Before deployment, the pre-trained LLMs often need to be tailored to specific downstream tasks for improved performance, which is commonly referred to as downstream alignment. This is a costly effort considering the needed manpower, training resources, and downstream-specific data. While much attention has been paid to protecting the copyright of the models themselves, the copyright protection of LLM alignment has been largely overlooked. In this paper, we present Watermark Embedding for Downstream Alignment (WEDA) scheme, which can provide effective copyright protection for two popular LLM alignment techniques parameter-efficient fine-tuning (PEFT) and in-context learning (ICL). For alignment through PEFT, we propose a Chain of Thought (CoT) based solution to embed watermarks into the PEFT weights. Furthermore, we extend this solution to safeguard alignment through ICL by utilizing the prefix-integrated CoT to watermark examples embedded within ICL prompts. We conduct an extensive experimental evaluation to demonstrate the effectiveness of our proposed scheme.
期刊介绍:
The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.