抽象文本摘要：对技术、系统和挑战的全面调查

IF 12.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computer Science Review Pub Date : 2025-05-20 DOI:10.1016/j.cosrev.2025.100762

Norah Almohaimeed, Aqil M. Azmi

{"title":"抽象文本摘要：对技术、系统和挑战的全面调查","authors":"Norah Almohaimeed, Aqil M. Azmi","doi":"10.1016/j.cosrev.2025.100762","DOIUrl":null,"url":null,"abstract":"<div><div>Abstractive text summarization addresses information overload by generating paraphrased content that mimics human expression, yet it faces significant computational and linguistic challenges. This paper presents a detailed functional taxonomy of abstractive summarization, structured along four dimensions: techniques (including structure-based, semantic, and deep learning approaches, including large language models), system architectures (ranging from single-model to multi-agent and human-in-the-loop interactive systems), evaluation methods (covering lexical, semantic, and human-centered assessments), and datasets. Our taxonomy explicitly distinguishes techniques from architectures to clarify how methodological strategies are operationalized in practice. We examine pressing multilingual challenges such as linguistic complexity, data scarcity, and performance disparities in cross-lingual transfer, particularly for low-resource languages. Additionally, we address persistent issues such as factual inaccuracies, content hallucinations, and biases in widely used evaluation metrics. The paper highlights emerging trends—including cross-lingual summarization, interactive summarization systems, and ethically grounded frameworks—as key directions for future research. This synthesis not only maps the current landscape but also outlines pathways to enhance the accuracy, reliability, and applicability of abstractive summarization in real-world settings.</div></div>","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":"57 ","pages":"Article 100762"},"PeriodicalIF":12.7000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Abstractive text summarization: A comprehensive survey of techniques, systems, and challenges\",\"authors\":\"Norah Almohaimeed, Aqil M. Azmi\",\"doi\":\"10.1016/j.cosrev.2025.100762\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Abstractive text summarization addresses information overload by generating paraphrased content that mimics human expression, yet it faces significant computational and linguistic challenges. This paper presents a detailed functional taxonomy of abstractive summarization, structured along four dimensions: techniques (including structure-based, semantic, and deep learning approaches, including large language models), system architectures (ranging from single-model to multi-agent and human-in-the-loop interactive systems), evaluation methods (covering lexical, semantic, and human-centered assessments), and datasets. Our taxonomy explicitly distinguishes techniques from architectures to clarify how methodological strategies are operationalized in practice. We examine pressing multilingual challenges such as linguistic complexity, data scarcity, and performance disparities in cross-lingual transfer, particularly for low-resource languages. Additionally, we address persistent issues such as factual inaccuracies, content hallucinations, and biases in widely used evaluation metrics. The paper highlights emerging trends—including cross-lingual summarization, interactive summarization systems, and ethically grounded frameworks—as key directions for future research. This synthesis not only maps the current landscape but also outlines pathways to enhance the accuracy, reliability, and applicability of abstractive summarization in real-world settings.</div></div>\",\"PeriodicalId\":48633,\"journal\":{\"name\":\"Computer Science Review\",\"volume\":\"57 \",\"pages\":\"Article 100762\"},\"PeriodicalIF\":12.7000,\"publicationDate\":\"2025-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Science Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1574013725000383\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574013725000383","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

抽象文本摘要通过生成模仿人类表达的意译内容来解决信息过载问题，但它面临着重大的计算和语言挑战。本文提出了抽象摘要的详细功能分类，它沿着四个维度构建：技术（包括基于结构的、语义的和深度学习的方法，包括大型语言模型）、系统架构（从单模型到多智能体和人在环交互系统）、评估方法（涵盖词汇、语义和以人为中心的评估）和数据集。我们的分类法明确地将技术与体系结构区分开来，以阐明方法策略在实践中是如何操作的。我们研究了紧迫的多语言挑战，如语言复杂性、数据稀缺性和跨语言迁移中的表现差异，特别是对于低资源语言。此外，我们解决了持续存在的问题，如事实不准确，内容幻觉，以及广泛使用的评估指标的偏见。本文强调了新兴趋势，包括跨语言摘要，交互式摘要系统和基于伦理的框架，作为未来研究的关键方向。这种综合不仅绘制了当前的景观，而且还概述了在现实环境中提高抽象摘要的准确性、可靠性和适用性的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Abstractive text summarization: A comprehensive survey of techniques, systems, and challenges

Abstractive text summarization addresses information overload by generating paraphrased content that mimics human expression, yet it faces significant computational and linguistic challenges. This paper presents a detailed functional taxonomy of abstractive summarization, structured along four dimensions: techniques (including structure-based, semantic, and deep learning approaches, including large language models), system architectures (ranging from single-model to multi-agent and human-in-the-loop interactive systems), evaluation methods (covering lexical, semantic, and human-centered assessments), and datasets. Our taxonomy explicitly distinguishes techniques from architectures to clarify how methodological strategies are operationalized in practice. We examine pressing multilingual challenges such as linguistic complexity, data scarcity, and performance disparities in cross-lingual transfer, particularly for low-resource languages. Additionally, we address persistent issues such as factual inaccuracies, content hallucinations, and biases in widely used evaluation metrics. The paper highlights emerging trends—including cross-lingual summarization, interactive summarization systems, and ethically grounded frameworks—as key directions for future research. This synthesis not only maps the current landscape but also outlines pathways to enhance the accuracy, reliability, and applicability of abstractive summarization in real-world settings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Science Review Computer Science-General Computer Science

CiteScore

32.70

自引率

0.00%

发文量

审稿时长

51 days

期刊介绍： Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.