{"title":"Improving the Performance of the Extractive Text Summarization by a Novel Topic Modeling and Sentence Embedding Technique using SBERT","authors":"Paulus Setiawan Suryadjaja, Rila Mandala","doi":"10.1109/ICAICTA53211.2021.9640295","DOIUrl":null,"url":null,"abstract":"Given the limitations of human reading abilities and the massive amount of text data available in modern times, there is a need for an automatic text summarization system. One automatic text summarization method that produces a satisfactory summary is extractive text summarization based on density peaks clustering. Previous research that applied this method has become state-of-the-art for the DUC 2004 dataset. However, there is still an opportunity for further development, specifically by applying the artificial neural network-based sentence embedding technique to replace the embedding vector space model and LDA topic modeling that was previously used. This research proposes a cluster-based automatic text summarization system using Sentence-BERT (SBERT) to perform sentence embedding and topic modeling processes to improve the summarization technique proposed by previous research. SBERT was chosen because it has state-of-the-art performance on sentence embedding tasks, so it is expected to represent the semantic meaning of sentences better than the techniques used in previous studies. This research is the first research that applied SBERT for text summarization. This research also proposes several improvements for the sentence selection techniques used in previous studies. Based on the assessment using the ROUGE toolkit, the text summarization system built in this study succeeded in creating a better summary than the previous research.","PeriodicalId":217463,"journal":{"name":"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA53211.2021.9640295","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Given the limitations of human reading abilities and the massive amount of text data available in modern times, there is a need for an automatic text summarization system. One automatic text summarization method that produces a satisfactory summary is extractive text summarization based on density peaks clustering. Previous research that applied this method has become state-of-the-art for the DUC 2004 dataset. However, there is still an opportunity for further development, specifically by applying the artificial neural network-based sentence embedding technique to replace the embedding vector space model and LDA topic modeling that was previously used. This research proposes a cluster-based automatic text summarization system using Sentence-BERT (SBERT) to perform sentence embedding and topic modeling processes to improve the summarization technique proposed by previous research. SBERT was chosen because it has state-of-the-art performance on sentence embedding tasks, so it is expected to represent the semantic meaning of sentences better than the techniques used in previous studies. This research is the first research that applied SBERT for text summarization. This research also proposes several improvements for the sentence selection techniques used in previous studies. Based on the assessment using the ROUGE toolkit, the text summarization system built in this study succeeded in creating a better summary than the previous research.