A topic-enhanced network via contrastive learning for abstractive text summarization

Data and information management Pub Date : 2026-06-01 Epub Date: 2025-10-29 DOI:10.1016/j.dim.2025.100114
Chuanming Yu , Dianyuan Zhang , Xiping Hao , Xueqing Fu , Jie Shen , Lu An
{"title":"A topic-enhanced network via contrastive learning for abstractive text summarization","authors":"Chuanming Yu ,&nbsp;Dianyuan Zhang ,&nbsp;Xiping Hao ,&nbsp;Xueqing Fu ,&nbsp;Jie Shen ,&nbsp;Lu An","doi":"10.1016/j.dim.2025.100114","DOIUrl":null,"url":null,"abstract":"<div><div>Abstractive text summarization has arisen as a notable research task and has garnered considerable attention. Despite the advancements made, existing methods still struggle to effectively address the issue of exposure bias, resulting in a disparity between training and inference. In addition, most contrastive-learning-based models neglect the importance of global semantics, such as the potential topic information. To address these problems, this paper proposes a novel topic-enhanced sequence-to-sequence network via contrastive learning (TESC) model. In contrast to most current research, this paper utilizes a combination of topic modeling and contrastive learning to lessen the exposure bias problem and improve the quality of the generated summaries. In addition, this paper employs hard negative sampling by selecting negative samples close to the positive one. Exposure bias refers to the discrepancy in automatic summarization models where training relies on ground-truth data while inference depends on self-generated sequences, leading to error accumulation and degraded summary quality. This paper performed rigorous experiments on four datasets, namely CNN/DailyMail, XSum, Reddit-TIFU, and SAMSum. The results from our experiments provide evidence of the efficacy and applicability of the TESC approach. The research sheds light on the role of topic consistency and the effectiveness of hard negative sampling in leveraging contrastive learning for enhancing the performance of current models.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"10 2","pages":"Article 100114"},"PeriodicalIF":0.0000,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and information management","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2543925125000221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/29 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstractive text summarization has arisen as a notable research task and has garnered considerable attention. Despite the advancements made, existing methods still struggle to effectively address the issue of exposure bias, resulting in a disparity between training and inference. In addition, most contrastive-learning-based models neglect the importance of global semantics, such as the potential topic information. To address these problems, this paper proposes a novel topic-enhanced sequence-to-sequence network via contrastive learning (TESC) model. In contrast to most current research, this paper utilizes a combination of topic modeling and contrastive learning to lessen the exposure bias problem and improve the quality of the generated summaries. In addition, this paper employs hard negative sampling by selecting negative samples close to the positive one. Exposure bias refers to the discrepancy in automatic summarization models where training relies on ground-truth data while inference depends on self-generated sequences, leading to error accumulation and degraded summary quality. This paper performed rigorous experiments on four datasets, namely CNN/DailyMail, XSum, Reddit-TIFU, and SAMSum. The results from our experiments provide evidence of the efficacy and applicability of the TESC approach. The research sheds light on the role of topic consistency and the effectiveness of hard negative sampling in leveraging contrastive learning for enhancing the performance of current models.
基于对比学习的主题增强网络抽象文本摘要
摘要摘要已成为一项引人注目的研究课题,引起了人们的广泛关注。尽管取得了进步,现有的方法仍然难以有效地解决暴露偏差的问题,导致训练和推理之间的差距。此外,大多数基于对比学习的模型忽略了全局语义的重要性,例如潜在的主题信息。为了解决这些问题,本文提出了一种基于对比学习(TESC)模型的主题增强序列到序列网络。与目前大多数研究相比,本文利用主题建模和对比学习相结合的方法来减少暴露偏差问题,提高生成摘要的质量。此外,本文采用硬负抽样,选取接近正样本的负样本。暴露偏差是指自动总结模型中训练依赖于ground-truth数据而推理依赖于自生成序列的差异,导致错误积累和总结质量下降。本文在CNN/DailyMail、XSum、Reddit-TIFU和SAMSum四个数据集上进行了严格的实验。实验结果证明了TESC方法的有效性和适用性。该研究揭示了主题一致性和硬负抽样在利用对比学习提高当前模型性能方面的作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Data and information management
Data and information management Management Information Systems, Library and Information Sciences
CiteScore
3.70
自引率
0.00%
发文量
0
审稿时长
55 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书