Chuanming Yu , Dianyuan Zhang , Xiping Hao , Xueqing Fu , Jie Shen , Lu An
{"title":"A topic-enhanced network via contrastive learning for abstractive text summarization","authors":"Chuanming Yu , Dianyuan Zhang , Xiping Hao , Xueqing Fu , Jie Shen , Lu An","doi":"10.1016/j.dim.2025.100114","DOIUrl":null,"url":null,"abstract":"<div><div>Abstractive text summarization has arisen as a notable research task and has garnered considerable attention. Despite the advancements made, existing methods still struggle to effectively address the issue of exposure bias, resulting in a disparity between training and inference. In addition, most contrastive-learning-based models neglect the importance of global semantics, such as the potential topic information. To address these problems, this paper proposes a novel topic-enhanced sequence-to-sequence network via contrastive learning (TESC) model. In contrast to most current research, this paper utilizes a combination of topic modeling and contrastive learning to lessen the exposure bias problem and improve the quality of the generated summaries. In addition, this paper employs hard negative sampling by selecting negative samples close to the positive one. Exposure bias refers to the discrepancy in automatic summarization models where training relies on ground-truth data while inference depends on self-generated sequences, leading to error accumulation and degraded summary quality. This paper performed rigorous experiments on four datasets, namely CNN/DailyMail, XSum, Reddit-TIFU, and SAMSum. The results from our experiments provide evidence of the efficacy and applicability of the TESC approach. The research sheds light on the role of topic consistency and the effectiveness of hard negative sampling in leveraging contrastive learning for enhancing the performance of current models.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"10 2","pages":"Article 100114"},"PeriodicalIF":0.0000,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and information management","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2543925125000221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/29 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Abstractive text summarization has arisen as a notable research task and has garnered considerable attention. Despite the advancements made, existing methods still struggle to effectively address the issue of exposure bias, resulting in a disparity between training and inference. In addition, most contrastive-learning-based models neglect the importance of global semantics, such as the potential topic information. To address these problems, this paper proposes a novel topic-enhanced sequence-to-sequence network via contrastive learning (TESC) model. In contrast to most current research, this paper utilizes a combination of topic modeling and contrastive learning to lessen the exposure bias problem and improve the quality of the generated summaries. In addition, this paper employs hard negative sampling by selecting negative samples close to the positive one. Exposure bias refers to the discrepancy in automatic summarization models where training relies on ground-truth data while inference depends on self-generated sequences, leading to error accumulation and degraded summary quality. This paper performed rigorous experiments on four datasets, namely CNN/DailyMail, XSum, Reddit-TIFU, and SAMSum. The results from our experiments provide evidence of the efficacy and applicability of the TESC approach. The research sheds light on the role of topic consistency and the effectiveness of hard negative sampling in leveraging contrastive learning for enhancing the performance of current models.