Introducing Simplicity in Document Summarization by Leveraging Coreference and Pseudo-Summary

Charkkri Limbud, Yen-Hao Huang, Alejandro Cortes, Yi-Shin Chen
{"title":"Introducing Simplicity in Document Summarization by Leveraging Coreference and Pseudo-Summary","authors":"Charkkri Limbud, Yen-Hao Huang, Alejandro Cortes, Yi-Shin Chen","doi":"10.1109/TAAI57707.2022.00027","DOIUrl":null,"url":null,"abstract":"Document summarization has rapidly gained importance due to the exponentially increasing data. Generally, studies in document summarization focused on generating summaries having high coverage and fluency. Such summaries can be challenging for readers with limited language proficiency. This paper introduces the simplification concept in document summarization tasks. Our method is divided into two phases to handle the challenges of the task. The first phase is to tackle the problem of long documents with unnecessary details that can affect key information or coverage of the summaries. Thus, we propose a method to condense key information by utilizing coreference resolution. The second phase uses the condensed documents as inputs. This phase handles the challenge of having no dataset with a simplification concept in summarization tasks. Therefore, this research proposes an unsupervised training framework without relying on golden summaries. The training first outputs summaries with high coverage called pseudo-summaries. Then, it is used as a reference to generate final summaries with words that are more familiar and commonly used, resulting in easier-to-understand summaries for readers.","PeriodicalId":111620,"journal":{"name":"2022 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAAI57707.2022.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Document summarization has rapidly gained importance due to the exponentially increasing data. Generally, studies in document summarization focused on generating summaries having high coverage and fluency. Such summaries can be challenging for readers with limited language proficiency. This paper introduces the simplification concept in document summarization tasks. Our method is divided into two phases to handle the challenges of the task. The first phase is to tackle the problem of long documents with unnecessary details that can affect key information or coverage of the summaries. Thus, we propose a method to condense key information by utilizing coreference resolution. The second phase uses the condensed documents as inputs. This phase handles the challenge of having no dataset with a simplification concept in summarization tasks. Therefore, this research proposes an unsupervised training framework without relying on golden summaries. The training first outputs summaries with high coverage called pseudo-summaries. Then, it is used as a reference to generate final summaries with words that are more familiar and commonly used, resulting in easier-to-understand summaries for readers.
利用共同参考和伪摘要引入文档摘要的简便性
由于数据呈指数级增长,文档摘要迅速变得重要起来。一般来说,文献摘要的研究重点是生成具有高覆盖率和流畅性的摘要。对于语言能力有限的读者来说,这样的摘要可能具有挑战性。本文介绍了文档摘要任务中的简化概念。我们的方法分为两个阶段来处理任务的挑战。第一阶段是解决包含不必要细节的冗长文档的问题,这些细节可能会影响关键信息或摘要的覆盖范围。因此,我们提出了一种利用共参考分辨率来压缩关键信息的方法。第二阶段使用压缩文档作为输入。这一阶段处理了汇总任务中没有数据集的挑战,并提出了简化概念。因此,本研究提出了一种不依赖黄金摘要的无监督训练框架。训练首先输出高覆盖率的摘要,称为伪摘要。然后,将其作为参考,使用更熟悉和常用的单词生成最终的摘要,从而使读者更容易理解摘要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信