Mahmoud R. Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden
{"title":"Graph-based Growing self-organizing map for Single Document Summarization (GGSDS)","authors":"Mahmoud R. Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden","doi":"10.1109/PICECE.2019.8747236","DOIUrl":null,"url":null,"abstract":"The huge collection of text available represents a remarkable challenge to process and exploit it in many fields. Therefore, there is a multitude of articles that are being proposed to summarize text automatically. More accurate and higher performing models are still required for text summarization. It is one of the most common tasks of text mining. In this paper, a novel Graph-based Growing self-organizing map for Single Document Summarization (GGSDS). GGSDS is an unsupervised extractive summarization approach composed mainly of five tasks: text pre-processing, document representation, sub-topics identification, sentence ranking and finally summary generation. The entire text of a document is represented in GGSDS by one accumulative graph. The choice of this representation model supports the extraction of all required features as to achieve the most suitable summary of text, especially the shared phrases between sentences. The impact of the sub-topics on the accuracy and comprehensiveness of the generated summary is taken into account in the design of GGSDS model. For this purpose, G-GSOM is employed to cluster sentences into clusters to represent the sub-topics of text. Next, sentences are scored using TextRank algorithm under the assumption that when a sentence has more relation with others, it is considered as more important and more representative to a sub-topic. Finally, the sentences with the highest score in each cluster are selected for generating the summary. Experimental results showed that GGSDS generated summaries of single documents with more than 80% accuracy of two datasets. Furthermore, these summaries covered most of the sub-topics of the documents.","PeriodicalId":375980,"journal":{"name":"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PICECE.2019.8747236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The huge collection of text available represents a remarkable challenge to process and exploit it in many fields. Therefore, there is a multitude of articles that are being proposed to summarize text automatically. More accurate and higher performing models are still required for text summarization. It is one of the most common tasks of text mining. In this paper, a novel Graph-based Growing self-organizing map for Single Document Summarization (GGSDS). GGSDS is an unsupervised extractive summarization approach composed mainly of five tasks: text pre-processing, document representation, sub-topics identification, sentence ranking and finally summary generation. The entire text of a document is represented in GGSDS by one accumulative graph. The choice of this representation model supports the extraction of all required features as to achieve the most suitable summary of text, especially the shared phrases between sentences. The impact of the sub-topics on the accuracy and comprehensiveness of the generated summary is taken into account in the design of GGSDS model. For this purpose, G-GSOM is employed to cluster sentences into clusters to represent the sub-topics of text. Next, sentences are scored using TextRank algorithm under the assumption that when a sentence has more relation with others, it is considered as more important and more representative to a sub-topic. Finally, the sentences with the highest score in each cluster are selected for generating the summary. Experimental results showed that GGSDS generated summaries of single documents with more than 80% accuracy of two datasets. Furthermore, these summaries covered most of the sub-topics of the documents.