A Hybrid Approach for Automatic Text Summarization by Handling Out-of-Vocabulary Words Using TextR-BLG Pointer Algorithm

IF 0.4 Q4 INFORMATION SCIENCE & LIBRARY SCIENCE
Sonali Mhatre, Lata L. Ragha
{"title":"A Hybrid Approach for Automatic Text Summarization by Handling Out-of-Vocabulary Words Using TextR-BLG Pointer Algorithm","authors":"Sonali Mhatre, Lata L. Ragha","doi":"10.3103/s0147688224010106","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Abstract</h3><p>Long documents such as scientific papers and government reports, often discuss substantial issues in long conversation, which are time-consuming to read and understand. Generating abstractive summaries can help readers quickly grasp the main topics, yet prior work has mostly focused on short texts and also has some of the drawbacks such as out-of-vocabulary (OOV), mismatched sentences and meaning less summary. Hence, to overcome these issues the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm. In this designed model, the long document is given as an input for automatic text summarization and are evaluated for the word frequency length, based on the threshold value the sentences are split into extractive and abstractive approach. The text algorithm used in this model finds out the sentence similarity score and is validated with the plotted graph. Likewise, the sentences above the threshold level are considered as the abstractive approach. The optimized BERT-LSTM-BiGRU (BLG) based pointer algorithm is used for learning the meaning from the sentences by word embedding, encoding and decoding the hidden states. Finally, the reframed sentences are considered for attaining the similarity score and plotting graph. The sentences from the abstractive and extractive approach are ranked based on the plotted graph for generating the summary. For evaluating the performance of the model, the ROUGE 1, ROUGE 2, ROUGE L, Bert Score, Bleu Score, and Meteor Score are 59.2, 58.4, 62.3, 0.92, 0.78, and 0.67, which are compared with the existing model. From the evaluation of proposed and existing summarization techniques, the proposed model attains better text summarization than the existing model. Thus, the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm performs better by handling out-of-vocabulary words than the existing text summarization techniques.</p>","PeriodicalId":43962,"journal":{"name":"Scientific and Technical Information Processing","volume":"29 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and Technical Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3103/s0147688224010106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Long documents such as scientific papers and government reports, often discuss substantial issues in long conversation, which are time-consuming to read and understand. Generating abstractive summaries can help readers quickly grasp the main topics, yet prior work has mostly focused on short texts and also has some of the drawbacks such as out-of-vocabulary (OOV), mismatched sentences and meaning less summary. Hence, to overcome these issues the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm. In this designed model, the long document is given as an input for automatic text summarization and are evaluated for the word frequency length, based on the threshold value the sentences are split into extractive and abstractive approach. The text algorithm used in this model finds out the sentence similarity score and is validated with the plotted graph. Likewise, the sentences above the threshold level are considered as the abstractive approach. The optimized BERT-LSTM-BiGRU (BLG) based pointer algorithm is used for learning the meaning from the sentences by word embedding, encoding and decoding the hidden states. Finally, the reframed sentences are considered for attaining the similarity score and plotting graph. The sentences from the abstractive and extractive approach are ranked based on the plotted graph for generating the summary. For evaluating the performance of the model, the ROUGE 1, ROUGE 2, ROUGE L, Bert Score, Bleu Score, and Meteor Score are 59.2, 58.4, 62.3, 0.92, 0.78, and 0.67, which are compared with the existing model. From the evaluation of proposed and existing summarization techniques, the proposed model attains better text summarization than the existing model. Thus, the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm performs better by handling out-of-vocabulary words than the existing text summarization techniques.

Abstract Image

使用 TextR-BLG 指针算法处理词汇表外词语的自动文本摘要混合方法
摘要 科学论文和政府报告等长篇文件通常以长篇对话的形式讨论实质性问题,阅读和理解起来非常耗时。生成抽象的摘要可以帮助读者快速掌握主要内容,但之前的工作大多集中在短篇文本上,而且还存在一些缺点,如词汇不足(OOV)、句子不匹配和摘要意义较少等。因此,为了克服这些问题,我们采用了 TextR-BLG 指针算法来实现文本自动摘要的混合方法。在这个设计的模型中,长文档作为自动文本摘要的输入,并对词频长度进行评估,根据阈值将句子分成提取和抽象两种方法。该模型中使用的文本算法可以找出句子的相似度得分,并通过绘制的图表进行验证。同样,高于阈值的句子被视为抽象方法。基于 BERT-LSTM-BiGRU (BLG) 的优化指针算法通过单词嵌入、编码和解码隐藏状态来学习句子的含义。最后,对重构后的句子进行考虑,以获得相似度得分并绘制图表。根据绘制的图表对抽象和提取方法得出的句子进行排序,以生成摘要。为了评估模型的性能,与现有模型相比,ROUGE 1、ROUGE 2、ROUGE L、Bert Score、Bleu Score 和 Meteor Score 分别为 59.2、58.4、62.3、0.92、0.78 和 0.67。通过对提出的摘要技术和现有摘要技术的评估,提出的模型比现有模型获得了更好的文本摘要效果。因此,使用 TextR-BLG 指针算法进行自动文本摘要的混合方法在处理词汇表外词语方面的表现优于现有的文本摘要技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Scientific and Technical Information Processing
Scientific and Technical Information Processing INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
1.00
自引率
42.90%
发文量
20
期刊介绍: Scientific and Technical Information Processing  is a refereed journal that covers all aspects of management and use of information technology in libraries and archives, information centres, and the information industry in general. Emphasis is on practical applications of new technologies and techniques for information analysis and processing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信