Enhancing extractive multi-documents summarization with a novel dominating set model for semantic relationship detection

IF 5.1 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Said Yunus , Cengiz Hark , Fatih Okumuş
{"title":"Enhancing extractive multi-documents summarization with a novel dominating set model for semantic relationship detection","authors":"Said Yunus ,&nbsp;Cengiz Hark ,&nbsp;Fatih Okumuş","doi":"10.1016/j.jestch.2025.102127","DOIUrl":null,"url":null,"abstract":"<div><div>In this paper, the Dominant Set-Based Extractive Text summarizing (DSETS) framework is proposed, which gives a new approach to automatic text summarizing. Utilizing the Minimum Dominant Set technique, the proposed framework creates summaries based on a word-level graphical representation that minimizes information loss while maintaining significant semantics. DSETS aims to inspire an alternative perspective on the computational text summarization method. The proposed framework distributes the processing load and reduces time complexity with the segmentation it applies, thus providing more scalable performance on large datasets. Additionally, empirical runtime and memory evaluations revealed that the proposed segmentation strategy reduced processing time by up to 24 % and offered comparable memory usage to lighter baseline methods, demonstrating its practicality in resource-constrained environments. After comparing the effectiveness of the DSETS framework with a series of text summarization techniques, it was determined that it offers significantly improved text summarization performance. Experiments were conducted using four different datasets (BBC News, XSum, CNN/Daily Mail and MultiNews) and summaries of varying word lengths were generated. The proposed framework achieved the highest ROUGE (1, 2, L, W) scores on most of the summary configurations generated on different datasets and various word counts. In particular, ROUGE-W F-scores improved by up to 15.8 %, while ROUGE-1 and ROUGE-L showed significant increases of 3 % to 8 % across various summary lengths. The evaluation results suggest that the DSETS framework was able to outperform many state-of-the-art summarization methods, with improvements observed between 1.3 % and 15.8 % depending on the metric and dataset. To better understand which parts of the system contributed most to this success, an ablation study was carried out. The findings from this analysis indicated that the segmentation mechanism and the semantic filtering process played a key role—particularly in enhancing recall-based performance. Taken together, these results indicate that DSETS is not only a strong and reliable framework for extractive summarization, especially in single-topic documents, but also a promising option for building lightweight and interpretable summarization systems in future applications.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"69 ","pages":"Article 102127"},"PeriodicalIF":5.1000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221509862500182X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, the Dominant Set-Based Extractive Text summarizing (DSETS) framework is proposed, which gives a new approach to automatic text summarizing. Utilizing the Minimum Dominant Set technique, the proposed framework creates summaries based on a word-level graphical representation that minimizes information loss while maintaining significant semantics. DSETS aims to inspire an alternative perspective on the computational text summarization method. The proposed framework distributes the processing load and reduces time complexity with the segmentation it applies, thus providing more scalable performance on large datasets. Additionally, empirical runtime and memory evaluations revealed that the proposed segmentation strategy reduced processing time by up to 24 % and offered comparable memory usage to lighter baseline methods, demonstrating its practicality in resource-constrained environments. After comparing the effectiveness of the DSETS framework with a series of text summarization techniques, it was determined that it offers significantly improved text summarization performance. Experiments were conducted using four different datasets (BBC News, XSum, CNN/Daily Mail and MultiNews) and summaries of varying word lengths were generated. The proposed framework achieved the highest ROUGE (1, 2, L, W) scores on most of the summary configurations generated on different datasets and various word counts. In particular, ROUGE-W F-scores improved by up to 15.8 %, while ROUGE-1 and ROUGE-L showed significant increases of 3 % to 8 % across various summary lengths. The evaluation results suggest that the DSETS framework was able to outperform many state-of-the-art summarization methods, with improvements observed between 1.3 % and 15.8 % depending on the metric and dataset. To better understand which parts of the system contributed most to this success, an ablation study was carried out. The findings from this analysis indicated that the segmentation mechanism and the semantic filtering process played a key role—particularly in enhancing recall-based performance. Taken together, these results indicate that DSETS is not only a strong and reliable framework for extractive summarization, especially in single-topic documents, but also a promising option for building lightweight and interpretable summarization systems in future applications.

Abstract Image

利用一种新的语义关系检测支配集模型增强抽取多文档摘要
本文提出了基于优势集的提取文本摘要框架(DSETS),为文本自动摘要提供了一种新的方法。利用最小优势集技术,提出的框架基于单词级图形表示创建摘要,在保持重要语义的同时最大限度地减少信息丢失。DSETS旨在激发对计算文本摘要方法的另一种观点。所提出的框架分配了处理负载,并通过它所应用的分割降低了时间复杂度,从而在大型数据集上提供了更高的可扩展性能。此外,经验运行时和内存评估表明,所提出的分割策略将处理时间减少了24%,并且提供了与较轻基线方法相当的内存使用,证明了其在资源受限环境中的实用性。在将DSETS框架与一系列文本摘要技术的有效性进行比较后,确定它提供了显着改进的文本摘要性能。实验使用了四种不同的数据集(BBC News、XSum、CNN/Daily Mail和multiinews),并生成了不同单词长度的摘要。该框架在不同数据集和不同字数上生成的大多数摘要配置上获得了最高的ROUGE(1、2、L、W)分数。特别是,ROUGE-W的f -得分提高了15.8%,而ROUGE-1和ROUGE-L在不同的摘要长度上表现出3%至8%的显著提高。评估结果表明,DSETS框架能够优于许多最先进的总结方法,根据度量和数据集的不同,改进幅度在1.3%到15.8%之间。为了更好地了解系统的哪些部分对这一成功贡献最大,进行了消融研究。分析结果表明,切分机制和语义过滤过程在提高基于回忆的性能方面发挥了关键作用。综上所述,这些结果表明,DSETS不仅是一个强大而可靠的抽取摘要框架,特别是在单主题文档中,而且是在未来应用中构建轻量级和可解释的摘要系统的一个有希望的选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Engineering Science and Technology-An International Journal-Jestech
Engineering Science and Technology-An International Journal-Jestech Materials Science-Electronic, Optical and Magnetic Materials
CiteScore
11.20
自引率
3.50%
发文量
153
审稿时长
22 days
期刊介绍: Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology. The scope of JESTECH includes a wide spectrum of subjects including: -Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing) -Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences) -Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信