Tianle Gu, Kexin Huang, Ruilin Luo, Yuanqi Yao, Yujiu Yang, Yan Teng, Yingchun Wang
{"title":"MEOW:通过倒置事实进行 MEMOry 监督 LLM 解学习","authors":"Tianle Gu, Kexin Huang, Ruilin Luo, Yuanqi Yao, Yujiu Yang, Yan Teng, Yingchun Wang","doi":"arxiv-2409.11844","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) can memorize sensitive information, raising\nconcerns about potential misuse. LLM Unlearning, a post-hoc approach to remove\nthis information from trained LLMs, offers a promising solution to mitigate\nthese risks. However, previous practices face three key challenges: 1. Utility:\nsuccessful unlearning often causes catastrophic collapse on unrelated tasks. 2.\nEfficiency: many methods either involve adding similarly sized models, which\nslows down unlearning or inference, or require retain data that are difficult\nto obtain. 3. Robustness: even effective methods may still leak data via\nextraction techniques. To address these challenges, we propose MEOW, a simple\nyet effective gradient descent-based unlearning method. Specifically, we use an\noffline LLM to generate a set of inverted facts. Then, we design a new metric,\nMEMO, to quantify memorization in LLMs. Finally, based on the signals provided\nby MEMO, we select the most appropriate set of inverted facts and finetune the\nmodel based on them. We evaluate MEOW on the commonly used unlearn benchmark,\nToFU, with Llama2-7B-Chat and Phi-1.5B, and test it on both NLU and NLG tasks.\nResults demonstrate significant improvement of MEOW in forget quality without\nsubstantial loss in model utility. Meanwhile, MEOW does not exhibit significant\ndegradation in NLU or NLG capabilities, and there is even a slight improvement\nin NLU performance.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts\",\"authors\":\"Tianle Gu, Kexin Huang, Ruilin Luo, Yuanqi Yao, Yujiu Yang, Yan Teng, Yingchun Wang\",\"doi\":\"arxiv-2409.11844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large Language Models (LLMs) can memorize sensitive information, raising\\nconcerns about potential misuse. LLM Unlearning, a post-hoc approach to remove\\nthis information from trained LLMs, offers a promising solution to mitigate\\nthese risks. However, previous practices face three key challenges: 1. Utility:\\nsuccessful unlearning often causes catastrophic collapse on unrelated tasks. 2.\\nEfficiency: many methods either involve adding similarly sized models, which\\nslows down unlearning or inference, or require retain data that are difficult\\nto obtain. 3. Robustness: even effective methods may still leak data via\\nextraction techniques. To address these challenges, we propose MEOW, a simple\\nyet effective gradient descent-based unlearning method. Specifically, we use an\\noffline LLM to generate a set of inverted facts. Then, we design a new metric,\\nMEMO, to quantify memorization in LLMs. Finally, based on the signals provided\\nby MEMO, we select the most appropriate set of inverted facts and finetune the\\nmodel based on them. We evaluate MEOW on the commonly used unlearn benchmark,\\nToFU, with Llama2-7B-Chat and Phi-1.5B, and test it on both NLU and NLG tasks.\\nResults demonstrate significant improvement of MEOW in forget quality without\\nsubstantial loss in model utility. Meanwhile, MEOW does not exhibit significant\\ndegradation in NLU or NLG capabilities, and there is even a slight improvement\\nin NLU performance.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11844\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
Large Language Models (LLMs) can memorize sensitive information, raising
concerns about potential misuse. LLM Unlearning, a post-hoc approach to remove
this information from trained LLMs, offers a promising solution to mitigate
these risks. However, previous practices face three key challenges: 1. Utility:
successful unlearning often causes catastrophic collapse on unrelated tasks. 2.
Efficiency: many methods either involve adding similarly sized models, which
slows down unlearning or inference, or require retain data that are difficult
to obtain. 3. Robustness: even effective methods may still leak data via
extraction techniques. To address these challenges, we propose MEOW, a simple
yet effective gradient descent-based unlearning method. Specifically, we use an
offline LLM to generate a set of inverted facts. Then, we design a new metric,
MEMO, to quantify memorization in LLMs. Finally, based on the signals provided
by MEMO, we select the most appropriate set of inverted facts and finetune the
model based on them. We evaluate MEOW on the commonly used unlearn benchmark,
ToFU, with Llama2-7B-Chat and Phi-1.5B, and test it on both NLU and NLG tasks.
Results demonstrate significant improvement of MEOW in forget quality without
substantial loss in model utility. Meanwhile, MEOW does not exhibit significant
degradation in NLU or NLG capabilities, and there is even a slight improvement
in NLU performance.