{"title":"统一多场景总结评价与解释","authors":"Shuo Shang;Zhitao Yao;Hao Fu;Chongyang Tao;Xiuying Chen;Feng Wang;Yongbo Wang;Zhaochun Ren;Shen Gao","doi":"10.1109/TKDE.2024.3509715","DOIUrl":null,"url":null,"abstract":"Summarization quality evaluation is a non-trivial task in text summarization. Contemporary methods can be mainly categorized into two scenarios: (1) \n<italic>reference-based:</i>\n evaluating with human-labeled reference summary; (2) \n<italic>reference-free:</i>\n evaluating the summary consistency of the document. Recent studies mainly focus on one of these scenarios and explore training neural models to align with human criteria and finally give a numeric score. However, the models from different scenarios are optimized individually, which may result in sub-optimal performance since they neglect the shared knowledge across different scenarios. Besides, designing individual models for each scenario caused inconvenience to the user. Moreover, only providing the numeric quality evaluation score for users cannot help users to improve the summarization model, since they do not know why the score is low. Inspired by this, we propose \n<bold>U</b>\nnified \n<bold>M</b>\nulti-scenario \n<bold>S</b>\nummarization \n<bold>E</b>\nvaluator (UMSE) and \n<bold>M</b>\nulti-\n<bold>A</b>\ngent \n<bold>S</b>\nummarization \n<bold>E</b>\nvaluation \n<bold>E</b>\nxplainer (MASEE). More specifically, we propose a perturbed prefix tuning method to share cross-scenario knowledge between scenarios and use a self-supervised training paradigm to optimize the model without extra human labeling. Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios. We propose a multi-agent summary evaluation explanation method MASEE, which employs several LLM-based agents to generate detailed natural language explanations in four different aspects. Experimental results across three typical scenarios on the benchmark dataset SummEval indicate that our UMSE can achieve comparable performance with several existing strong methods that are specifically designed for each scenario. And intensive quantitative and qualitative experiments also demonstrate the effectiveness of our proposed explanation method, which can generate consistent and accurate explanations.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"991-1003"},"PeriodicalIF":8.9000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unified Multi-Scenario Summarization Evaluation and Explanation\",\"authors\":\"Shuo Shang;Zhitao Yao;Hao Fu;Chongyang Tao;Xiuying Chen;Feng Wang;Yongbo Wang;Zhaochun Ren;Shen Gao\",\"doi\":\"10.1109/TKDE.2024.3509715\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summarization quality evaluation is a non-trivial task in text summarization. Contemporary methods can be mainly categorized into two scenarios: (1) \\n<italic>reference-based:</i>\\n evaluating with human-labeled reference summary; (2) \\n<italic>reference-free:</i>\\n evaluating the summary consistency of the document. Recent studies mainly focus on one of these scenarios and explore training neural models to align with human criteria and finally give a numeric score. However, the models from different scenarios are optimized individually, which may result in sub-optimal performance since they neglect the shared knowledge across different scenarios. Besides, designing individual models for each scenario caused inconvenience to the user. Moreover, only providing the numeric quality evaluation score for users cannot help users to improve the summarization model, since they do not know why the score is low. Inspired by this, we propose \\n<bold>U</b>\\nnified \\n<bold>M</b>\\nulti-scenario \\n<bold>S</b>\\nummarization \\n<bold>E</b>\\nvaluator (UMSE) and \\n<bold>M</b>\\nulti-\\n<bold>A</b>\\ngent \\n<bold>S</b>\\nummarization \\n<bold>E</b>\\nvaluation \\n<bold>E</b>\\nxplainer (MASEE). More specifically, we propose a perturbed prefix tuning method to share cross-scenario knowledge between scenarios and use a self-supervised training paradigm to optimize the model without extra human labeling. Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios. We propose a multi-agent summary evaluation explanation method MASEE, which employs several LLM-based agents to generate detailed natural language explanations in four different aspects. Experimental results across three typical scenarios on the benchmark dataset SummEval indicate that our UMSE can achieve comparable performance with several existing strong methods that are specifically designed for each scenario. And intensive quantitative and qualitative experiments also demonstrate the effectiveness of our proposed explanation method, which can generate consistent and accurate explanations.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 2\",\"pages\":\"991-1003\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10778604/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10778604/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Unified Multi-Scenario Summarization Evaluation and Explanation
Summarization quality evaluation is a non-trivial task in text summarization. Contemporary methods can be mainly categorized into two scenarios: (1)
reference-based:
evaluating with human-labeled reference summary; (2)
reference-free:
evaluating the summary consistency of the document. Recent studies mainly focus on one of these scenarios and explore training neural models to align with human criteria and finally give a numeric score. However, the models from different scenarios are optimized individually, which may result in sub-optimal performance since they neglect the shared knowledge across different scenarios. Besides, designing individual models for each scenario caused inconvenience to the user. Moreover, only providing the numeric quality evaluation score for users cannot help users to improve the summarization model, since they do not know why the score is low. Inspired by this, we propose
U
nified
M
ulti-scenario
S
ummarization
E
valuator (UMSE) and
M
ulti-
A
gent
S
ummarization
E
valuation
E
xplainer (MASEE). More specifically, we propose a perturbed prefix tuning method to share cross-scenario knowledge between scenarios and use a self-supervised training paradigm to optimize the model without extra human labeling. Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios. We propose a multi-agent summary evaluation explanation method MASEE, which employs several LLM-based agents to generate detailed natural language explanations in four different aspects. Experimental results across three typical scenarios on the benchmark dataset SummEval indicate that our UMSE can achieve comparable performance with several existing strong methods that are specifically designed for each scenario. And intensive quantitative and qualitative experiments also demonstrate the effectiveness of our proposed explanation method, which can generate consistent and accurate explanations.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.