Improving the Performance of the Extractive Text Summarization by a Novel Topic Modeling and Sentence Embedding Technique using SBERT

2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA) Pub Date : 2021-09-29 DOI:10.1109/ICAICTA53211.2021.9640295

Paulus Setiawan Suryadjaja, Rila Mandala

{"title":"Improving the Performance of the Extractive Text Summarization by a Novel Topic Modeling and Sentence Embedding Technique using SBERT","authors":"Paulus Setiawan Suryadjaja, Rila Mandala","doi":"10.1109/ICAICTA53211.2021.9640295","DOIUrl":null,"url":null,"abstract":"Given the limitations of human reading abilities and the massive amount of text data available in modern times, there is a need for an automatic text summarization system. One automatic text summarization method that produces a satisfactory summary is extractive text summarization based on density peaks clustering. Previous research that applied this method has become state-of-the-art for the DUC 2004 dataset. However, there is still an opportunity for further development, specifically by applying the artificial neural network-based sentence embedding technique to replace the embedding vector space model and LDA topic modeling that was previously used. This research proposes a cluster-based automatic text summarization system using Sentence-BERT (SBERT) to perform sentence embedding and topic modeling processes to improve the summarization technique proposed by previous research. SBERT was chosen because it has state-of-the-art performance on sentence embedding tasks, so it is expected to represent the semantic meaning of sentences better than the techniques used in previous studies. This research is the first research that applied SBERT for text summarization. This research also proposes several improvements for the sentence selection techniques used in previous studies. Based on the assessment using the ROUGE toolkit, the text summarization system built in this study succeeded in creating a better summary than the previous research.","PeriodicalId":217463,"journal":{"name":"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA53211.2021.9640295","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Given the limitations of human reading abilities and the massive amount of text data available in modern times, there is a need for an automatic text summarization system. One automatic text summarization method that produces a satisfactory summary is extractive text summarization based on density peaks clustering. Previous research that applied this method has become state-of-the-art for the DUC 2004 dataset. However, there is still an opportunity for further development, specifically by applying the artificial neural network-based sentence embedding technique to replace the embedding vector space model and LDA topic modeling that was previously used. This research proposes a cluster-based automatic text summarization system using Sentence-BERT (SBERT) to perform sentence embedding and topic modeling processes to improve the summarization technique proposed by previous research. SBERT was chosen because it has state-of-the-art performance on sentence embedding tasks, so it is expected to represent the semantic meaning of sentences better than the techniques used in previous studies. This research is the first research that applied SBERT for text summarization. This research also proposes several improvements for the sentence selection techniques used in previous studies. Based on the assessment using the ROUGE toolkit, the text summarization system built in this study succeeded in creating a better summary than the previous research.

查看原文本刊更多论文

基于SBERT的主题建模和句子嵌入技术提高抽取文本摘要的性能

鉴于人类阅读能力的限制和现代大量的文本数据，需要一个自动文本摘要系统。一种产生满意摘要的自动文本摘要方法是基于密度峰聚类的抽取文本摘要。先前应用该方法的研究已成为DUC 2004数据集的最新技术。但是，仍然有进一步发展的机会，特别是应用基于人工神经网络的句子嵌入技术来取代之前使用的嵌入向量空间模型和LDA主题建模。本研究提出了一种基于聚类的自动文本摘要系统，利用句子bert (sentence - bert, SBERT)来完成句子嵌入和主题建模过程，以改进以往研究提出的摘要技术。之所以选择SBERT，是因为它在句子嵌入任务上具有最先进的性能，因此有望比以往研究中使用的技术更好地表示句子的语义。本研究是首次将SBERT应用于文本摘要的研究。本研究还对以往研究中使用的选句技术提出了几点改进。基于ROUGE工具包的评估，本研究构建的文本摘要系统成功地创建了比以往研究更好的摘要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)

自引率

0.00%

发文量