Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges.

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2024-12-01 DOI:10.1145/3715073.3715076

Baixiang Huang, Canyu Chen, Kai Shu

{"title":"Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges.","authors":"Baixiang Huang, Canyu Chen, Kai Shu","doi":"10.1145/3715073.3715076","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate attribution of authorship is crucial for maintaining the integrity of digital content, improving forensic investigations, and mitigating the risks of misinformation and plagiarism. Addressing the imperative need for proper authorship attribution is essential to uphold the credibility and accountability of authentic authorship. The rapid advancements of Large Language Models (LLMs) have blurred the lines between human and machine authorship, posing significant challenges for traditional methods. We present a comprehensive literature review that examines the latest research on authorship attribution in the era of LLMs. This survey systematically explores the landscape of this field by categorizing four representative problems: (1) Human-written Text Attribution; (2) LLM-generated Text Detection; (3) LLM-generated Text Attribution; and (4) Human-LLM Co-authored Text Attribution. We also discuss the challenges related to ensuring the generalization and explainability of authorship attribution methods. Generalization requires the ability to generalize across various domains, while explainability emphasizes providing transparent and understandable insights into the decisions made by these models. By evaluating the strengths and limitations of existing methods and benchmarks, we identify key open problems and future research directions in this field. This literature review serves a roadmap for researchers and practitioners interested in understanding the state of the art in this rapidly evolving field. Additional resources and a curated list of papers are available and regularly updated at https://llm-authorship.github.io/.</p>","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"26 2","pages":"21-43"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12019761/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3715073.3715076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate attribution of authorship is crucial for maintaining the integrity of digital content, improving forensic investigations, and mitigating the risks of misinformation and plagiarism. Addressing the imperative need for proper authorship attribution is essential to uphold the credibility and accountability of authentic authorship. The rapid advancements of Large Language Models (LLMs) have blurred the lines between human and machine authorship, posing significant challenges for traditional methods. We present a comprehensive literature review that examines the latest research on authorship attribution in the era of LLMs. This survey systematically explores the landscape of this field by categorizing four representative problems: (1) Human-written Text Attribution; (2) LLM-generated Text Detection; (3) LLM-generated Text Attribution; and (4) Human-LLM Co-authored Text Attribution. We also discuss the challenges related to ensuring the generalization and explainability of authorship attribution methods. Generalization requires the ability to generalize across various domains, while explainability emphasizes providing transparent and understandable insights into the decisions made by these models. By evaluating the strengths and limitations of existing methods and benchmarks, we identify key open problems and future research directions in this field. This literature review serves a roadmap for researchers and practitioners interested in understanding the state of the art in this rapidly evolving field. Additional resources and a curated list of papers are available and regularly updated at https://llm-authorship.github.io/.

查看原文本刊更多论文

法学硕士时代的作者归属：问题、方法和挑战。

准确的作者归属对于维护数字内容的完整性、改进法医调查以及减轻错误信息和剽窃的风险至关重要。解决正确作者归属的迫切需求对于维护真实作者的可信度和问责制至关重要。大型语言模型（llm）的快速发展模糊了人类和机器作者之间的界限，对传统方法提出了重大挑战。我们提出了一项全面的文献综述，探讨了法学硕士时代作者归属的最新研究。本调查通过对四个代表性问题的分类，系统地探讨了这一领域的格局：(1)人工文本归因；(2) llm生成文本检测；(3) llm生成的文本归属；(4) Human-LLM合著文本归属。我们还讨论了与确保作者归属方法的泛化和可解释性相关的挑战。泛化需要跨不同领域泛化的能力，而可解释性强调为这些模型所做的决策提供透明和可理解的见解。通过评估现有方法和基准的优势和局限性，我们确定了该领域的关键开放问题和未来的研究方向。本文献综述为有兴趣了解这一快速发展领域的艺术现状的研究人员和实践者提供了路线图。额外的资源和论文的策划列表是可用的，并定期更新在https://llm-authorship.github.io/。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining

自引率

0.00%

发文量