Unified Multi-Scenario Summarization Evaluation and Explanation

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-05 DOI:10.1109/TKDE.2024.3509715

Shuo Shang;Zhitao Yao;Hao Fu;Chongyang Tao;Xiuying Chen;Feng Wang;Yongbo Wang;Zhaochun Ren;Shen Gao

{"title":"Unified Multi-Scenario Summarization Evaluation and Explanation","authors":"Shuo Shang;Zhitao Yao;Hao Fu;Chongyang Tao;Xiuying Chen;Feng Wang;Yongbo Wang;Zhaochun Ren;Shen Gao","doi":"10.1109/TKDE.2024.3509715","DOIUrl":null,"url":null,"abstract":"Summarization quality evaluation is a non-trivial task in text summarization. Contemporary methods can be mainly categorized into two scenarios: (1) \n<italic>reference-based:\n evaluating with human-labeled reference summary; (2) \n<italic>reference-free:\n evaluating the summary consistency of the document. Recent studies mainly focus on one of these scenarios and explore training neural models to align with human criteria and finally give a numeric score. However, the models from different scenarios are optimized individually, which may result in sub-optimal performance since they neglect the shared knowledge across different scenarios. Besides, designing individual models for each scenario caused inconvenience to the user. Moreover, only providing the numeric quality evaluation score for users cannot help users to improve the summarization model, since they do not know why the score is low. Inspired by this, we propose \n<bold>U\nnified \n<bold>M\nulti-scenario \n<bold>S\nummarization \n<bold>E\nvaluator (UMSE) and \n<bold>M\nulti-\n<bold>A\ngent \n<bold>S\nummarization \n<bold>E\nvaluation \n<bold>E\nxplainer (MASEE). More specifically, we propose a perturbed prefix tuning method to share cross-scenario knowledge between scenarios and use a self-supervised training paradigm to optimize the model without extra human labeling. Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios. We propose a multi-agent summary evaluation explanation method MASEE, which employs several LLM-based agents to generate detailed natural language explanations in four different aspects. Experimental results across three typical scenarios on the benchmark dataset SummEval indicate that our UMSE can achieve comparable performance with several existing strong methods that are specifically designed for each scenario. And intensive quantitative and qualitative experiments also demonstrate the effectiveness of our proposed explanation method, which can generate consistent and accurate explanations.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"991-1003"},"PeriodicalIF":8.9000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10778604/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Summarization quality evaluation is a non-trivial task in text summarization. Contemporary methods can be mainly categorized into two scenarios: (1) reference-based: evaluating with human-labeled reference summary; (2) reference-free: evaluating the summary consistency of the document. Recent studies mainly focus on one of these scenarios and explore training neural models to align with human criteria and finally give a numeric score. However, the models from different scenarios are optimized individually, which may result in sub-optimal performance since they neglect the shared knowledge across different scenarios. Besides, designing individual models for each scenario caused inconvenience to the user. Moreover, only providing the numeric quality evaluation score for users cannot help users to improve the summarization model, since they do not know why the score is low. Inspired by this, we propose U nified M ulti-scenario S ummarization E valuator (UMSE) and M ulti- A gent S ummarization E valuation E xplainer (MASEE). More specifically, we propose a perturbed prefix tuning method to share cross-scenario knowledge between scenarios and use a self-supervised training paradigm to optimize the model without extra human labeling. Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios. We propose a multi-agent summary evaluation explanation method MASEE, which employs several LLM-based agents to generate detailed natural language explanations in four different aspects. Experimental results across three typical scenarios on the benchmark dataset SummEval indicate that our UMSE can achieve comparable performance with several existing strong methods that are specifically designed for each scenario. And intensive quantitative and qualitative experiments also demonstrate the effectiveness of our proposed explanation method, which can generate consistent and accurate explanations.

查看原文本刊更多论文

统一多场景总结评价与解释

摘要质量评价是文本摘要中的一项重要工作。现有的评价方法主要分为两大类：(1)基于参考文献的评价方法：利用人工标注的参考文献摘要进行评价；(2)无参考：评价文件摘要的一致性。最近的研究主要集中在这些场景之一，并探索训练神经模型以符合人类标准并最终给出数字分数。然而，来自不同场景的模型是单独优化的，这可能会导致性能次优，因为它们忽略了不同场景之间的共享知识。此外，为每个场景设计单独的模型给用户带来了不便。此外，仅向用户提供数字质量评价分数，并不能帮助用户改进总结模型，因为用户不知道分数低的原因。受此启发，我们提出了统一多场景摘要评估器（UMSE）和多智能体摘要评估解释器（MASEE）。更具体地说，我们提出了一种扰动前缀调优方法来共享场景之间的跨场景知识，并使用自监督训练范式来优化模型，而无需额外的人工标记。我们的UMSE是第一个统一的总结评估框架，具有在三个评估场景中使用的能力。我们提出了一种多智能体摘要评价解释方法MASEE，该方法使用几个基于llm的智能体从四个不同的方面生成详细的自然语言解释。在基准数据集SummEval上的三个典型场景的实验结果表明，我们的UMSE可以与专门为每个场景设计的几种现有的强方法实现相当的性能。大量的定量和定性实验也证明了我们提出的解释方法的有效性，它可以产生一致和准确的解释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.