Graph-based Growing self-organizing map for Single Document Summarization (GGSDS)

Mahmoud R. Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden
{"title":"Graph-based Growing self-organizing map for Single Document Summarization (GGSDS)","authors":"Mahmoud R. Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden","doi":"10.1109/PICECE.2019.8747236","DOIUrl":null,"url":null,"abstract":"The huge collection of text available represents a remarkable challenge to process and exploit it in many fields. Therefore, there is a multitude of articles that are being proposed to summarize text automatically. More accurate and higher performing models are still required for text summarization. It is one of the most common tasks of text mining. In this paper, a novel Graph-based Growing self-organizing map for Single Document Summarization (GGSDS). GGSDS is an unsupervised extractive summarization approach composed mainly of five tasks: text pre-processing, document representation, sub-topics identification, sentence ranking and finally summary generation. The entire text of a document is represented in GGSDS by one accumulative graph. The choice of this representation model supports the extraction of all required features as to achieve the most suitable summary of text, especially the shared phrases between sentences. The impact of the sub-topics on the accuracy and comprehensiveness of the generated summary is taken into account in the design of GGSDS model. For this purpose, G-GSOM is employed to cluster sentences into clusters to represent the sub-topics of text. Next, sentences are scored using TextRank algorithm under the assumption that when a sentence has more relation with others, it is considered as more important and more representative to a sub-topic. Finally, the sentences with the highest score in each cluster are selected for generating the summary. Experimental results showed that GGSDS generated summaries of single documents with more than 80% accuracy of two datasets. Furthermore, these summaries covered most of the sub-topics of the documents.","PeriodicalId":375980,"journal":{"name":"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PICECE.2019.8747236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The huge collection of text available represents a remarkable challenge to process and exploit it in many fields. Therefore, there is a multitude of articles that are being proposed to summarize text automatically. More accurate and higher performing models are still required for text summarization. It is one of the most common tasks of text mining. In this paper, a novel Graph-based Growing self-organizing map for Single Document Summarization (GGSDS). GGSDS is an unsupervised extractive summarization approach composed mainly of five tasks: text pre-processing, document representation, sub-topics identification, sentence ranking and finally summary generation. The entire text of a document is represented in GGSDS by one accumulative graph. The choice of this representation model supports the extraction of all required features as to achieve the most suitable summary of text, especially the shared phrases between sentences. The impact of the sub-topics on the accuracy and comprehensiveness of the generated summary is taken into account in the design of GGSDS model. For this purpose, G-GSOM is employed to cluster sentences into clusters to represent the sub-topics of text. Next, sentences are scored using TextRank algorithm under the assumption that when a sentence has more relation with others, it is considered as more important and more representative to a sub-topic. Finally, the sentences with the highest score in each cluster are selected for generating the summary. Experimental results showed that GGSDS generated summaries of single documents with more than 80% accuracy of two datasets. Furthermore, these summaries covered most of the sub-topics of the documents.
面向单文档摘要(GGSDS)的基于图的生长自组织映射
大量的可用文本代表了在许多领域处理和利用它的显著挑战。因此,有许多文章建议自动总结文本。文本摘要仍然需要更准确和更高性能的模型。这是文本挖掘中最常见的任务之一。本文提出了一种新的基于图的单文档摘要(GGSDS)增长自组织映射。GGSDS是一种无监督抽取摘要方法,主要由五个任务组成:文本预处理、文档表示、子主题识别、句子排序和最后的摘要生成。在GGSDS中,文档的整个文本由一个累积图表示。该表示模型的选择支持提取所有必需的特征,以实现最合适的文本摘要,特别是句子之间的共享短语。在设计GGSDS模型时,考虑了子主题对生成摘要的准确性和全面性的影响。为此,使用G-GSOM将句子聚类成簇来表示文本的子主题。接下来,使用TextRank算法对句子进行评分,假设一个句子与其他句子的关系越密切,则认为它对子主题越重要,越具有代表性。最后,在每个聚类中选择得分最高的句子生成摘要。实验结果表明,GGSDS对两个数据集生成单个文档摘要的准确率在80%以上。此外,这些摘要涵盖了文档的大部分子主题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信