M. Maryanto, Philips Philips, Abba Suganda Girsang
{"title":"Hybrid model for extractive single document summarization: utilizing BERTopic and BERT model","authors":"M. Maryanto, Philips Philips, Abba Suganda Girsang","doi":"10.11591/ijai.v13.i2.pp1723-1731","DOIUrl":null,"url":null,"abstract":"Extractive text summarization has been a popular research area for many years. The goal of this task is to generate a compact and coherent summary of a given document, preserving the most important information. However, current extractive summarization methods still face several challenges such as semantic drift, repetition, redundancy, and lack of coherence. A novel approach is presented in this paper to improve the performance of an extractive summarization model based on bidirectional encoder representations from transformers (BERT) by incorporating topic modeling using the BERTopic model. Our method first utilizes BERTopic to identify the dominant topics in a document and then employs a BERT-based deep neural network to extract the most salient sentences related to those topics. Our experiments on the cable news network (CNN)/daily mail dataset demonstrate that our proposed method outperforms state-of-the-art BERT-based extractive summarization models in terms of recall-oriented understudy for gisting evaluation (ROUGE) scores, which resulted in an increase of 32.53% of ROUGE-1, 47.55% of ROUGE-2, and 16.63% of ROUGE-L when compared to baseline BERT-based extractive summarization models. This paper contributes to the field of extractive text summarization, highlights the potential of topic modeling in improving summarization results, and provides a new direction for future research.","PeriodicalId":507934,"journal":{"name":"IAES International Journal of Artificial Intelligence (IJ-AI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IAES International Journal of Artificial Intelligence (IJ-AI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijai.v13.i2.pp1723-1731","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Extractive text summarization has been a popular research area for many years. The goal of this task is to generate a compact and coherent summary of a given document, preserving the most important information. However, current extractive summarization methods still face several challenges such as semantic drift, repetition, redundancy, and lack of coherence. A novel approach is presented in this paper to improve the performance of an extractive summarization model based on bidirectional encoder representations from transformers (BERT) by incorporating topic modeling using the BERTopic model. Our method first utilizes BERTopic to identify the dominant topics in a document and then employs a BERT-based deep neural network to extract the most salient sentences related to those topics. Our experiments on the cable news network (CNN)/daily mail dataset demonstrate that our proposed method outperforms state-of-the-art BERT-based extractive summarization models in terms of recall-oriented understudy for gisting evaluation (ROUGE) scores, which resulted in an increase of 32.53% of ROUGE-1, 47.55% of ROUGE-2, and 16.63% of ROUGE-L when compared to baseline BERT-based extractive summarization models. This paper contributes to the field of extractive text summarization, highlights the potential of topic modeling in improving summarization results, and provides a new direction for future research.