Junyi Bian, Xiaodi Huang, Hong Zhou, Tianyang Huang, Shanfeng Zhu
{"title":"GoSum:通过强化学习和图组织话语状态提取长文档摘要","authors":"Junyi Bian, Xiaodi Huang, Hong Zhou, Tianyang Huang, Shanfeng Zhu","doi":"10.1007/s10115-024-02195-3","DOIUrl":null,"url":null,"abstract":"<p>Summarizing extensive documents involves selecting sentences, with the organizational structure of document sections playing a pivotal role. However, effectively utilizing discourse information for summary generation poses a significant challenge, especially given the inconsistency between training and evaluation in extractive summarization. In this paper, we introduce GoSum, a novel extractive summarizer that integrates a graph-based model with reinforcement learning techniques to summarize long documents. Specifically, GoSum utilizes a graph neural network to encode sentence states, constructing a heterogeneous graph that represents each document at various discourse levels. The edges of this graph capture hierarchical relationships between different document sections. Furthermore, GoSum incorporates offline reinforcement learning, enabling the model to receive ROUGE score feedback on diverse training samples, thereby enhancing the quality of summary generation. On the two scientific article datasets PubMed and arXiv, GoSum achieved the highest performance among extractive models. Particularly on the PubMed dataset, GoSum outperformed other models with ROUGE-1 and ROUGE-L scores surpassing by 0.45 and 0.26, respectively.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"10 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GoSum: extractive summarization of long documents by reinforcement learning and graph-organized discourse state\",\"authors\":\"Junyi Bian, Xiaodi Huang, Hong Zhou, Tianyang Huang, Shanfeng Zhu\",\"doi\":\"10.1007/s10115-024-02195-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Summarizing extensive documents involves selecting sentences, with the organizational structure of document sections playing a pivotal role. However, effectively utilizing discourse information for summary generation poses a significant challenge, especially given the inconsistency between training and evaluation in extractive summarization. In this paper, we introduce GoSum, a novel extractive summarizer that integrates a graph-based model with reinforcement learning techniques to summarize long documents. Specifically, GoSum utilizes a graph neural network to encode sentence states, constructing a heterogeneous graph that represents each document at various discourse levels. The edges of this graph capture hierarchical relationships between different document sections. Furthermore, GoSum incorporates offline reinforcement learning, enabling the model to receive ROUGE score feedback on diverse training samples, thereby enhancing the quality of summary generation. On the two scientific article datasets PubMed and arXiv, GoSum achieved the highest performance among extractive models. Particularly on the PubMed dataset, GoSum outperformed other models with ROUGE-1 and ROUGE-L scores surpassing by 0.45 and 0.26, respectively.</p>\",\"PeriodicalId\":54749,\"journal\":{\"name\":\"Knowledge and Information Systems\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge and Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10115-024-02195-3\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10115-024-02195-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
GoSum: extractive summarization of long documents by reinforcement learning and graph-organized discourse state
Summarizing extensive documents involves selecting sentences, with the organizational structure of document sections playing a pivotal role. However, effectively utilizing discourse information for summary generation poses a significant challenge, especially given the inconsistency between training and evaluation in extractive summarization. In this paper, we introduce GoSum, a novel extractive summarizer that integrates a graph-based model with reinforcement learning techniques to summarize long documents. Specifically, GoSum utilizes a graph neural network to encode sentence states, constructing a heterogeneous graph that represents each document at various discourse levels. The edges of this graph capture hierarchical relationships between different document sections. Furthermore, GoSum incorporates offline reinforcement learning, enabling the model to receive ROUGE score feedback on diverse training samples, thereby enhancing the quality of summary generation. On the two scientific article datasets PubMed and arXiv, GoSum achieved the highest performance among extractive models. Particularly on the PubMed dataset, GoSum outperformed other models with ROUGE-1 and ROUGE-L scores surpassing by 0.45 and 0.26, respectively.
期刊介绍:
Knowledge and Information Systems (KAIS) provides an international forum for researchers and professionals to share their knowledge and report new advances on all topics related to knowledge systems and advanced information systems. This monthly peer-reviewed archival journal publishes state-of-the-art research reports on emerging topics in KAIS, reviews of important techniques in related areas, and application papers of interest to a general readership.