Stacked Denoising Variational Auto Encoder Model for Extractive Web Text Summarization

IF 1.4 4区工程技术 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Iranian Journal of Science and Technology-Transactions of Electrical Engineering Pub Date : 2024-09-13 DOI:10.1007/s40998-024-00751-9

Madhuri Yadav, Rahul Katarya

{"title":"Stacked Denoising Variational Auto Encoder Model for Extractive Web Text Summarization","authors":"Madhuri Yadav, Rahul Katarya","doi":"10.1007/s40998-024-00751-9","DOIUrl":null,"url":null,"abstract":"<p>Extracting and concatenating distilled content from a corpus into a summary is a technique known as extractive summarization. In recent days, extractive summarization of web text has become popular due to the wide usage of social media. Hence various researches have been conducted on extractive summarization of web text, but the processing of huge amounts of web text and understanding the context is difficult due to the requirement of a lot of storage and time. To solve this issue, the continuous bag of words text vectorization model has been used that reduce the processing time by producing a distributed combination of words in vector arrangement. Moreover, the polysemous words are unable to be captured, which makes extraction difficult. Hence a novel Hierarchical Attention pointer Stacked Denoising Variational Autoencoder Model has been proposed in which the SDVAE model forms latent distribution for contextualized words and bidirectional attention mechanism extracts keywords and features from sentences thereby capturing polysemic words. Furthermore, the summary is obtained with dangling anaphora whereas antecedent morphological expression and verb referents are not considered in the summary. Hence a novel Multilayered Competitive Probable Modular Perception Model has been proposed in which the competitive layer scores the sentence and the scored sentences are ranked using string kernel and class conditional probability thereby considering the antecedent morphological expression and then, Graph based Quadruplicate Lexicon Summarization is used that forms quadruplicate lexicon chain in graphical format to eliminate dangling anaphoric expressions. The experimental results obtained show that the proposed model has achieved a comparatively high accuracy of 98.3% and recall, precision, and F-measure of 98%.</p>","PeriodicalId":49064,"journal":{"name":"Iranian Journal of Science and Technology-Transactions of Electrical Engineering","volume":"43 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Iranian Journal of Science and Technology-Transactions of Electrical Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s40998-024-00751-9","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Extracting and concatenating distilled content from a corpus into a summary is a technique known as extractive summarization. In recent days, extractive summarization of web text has become popular due to the wide usage of social media. Hence various researches have been conducted on extractive summarization of web text, but the processing of huge amounts of web text and understanding the context is difficult due to the requirement of a lot of storage and time. To solve this issue, the continuous bag of words text vectorization model has been used that reduce the processing time by producing a distributed combination of words in vector arrangement. Moreover, the polysemous words are unable to be captured, which makes extraction difficult. Hence a novel Hierarchical Attention pointer Stacked Denoising Variational Autoencoder Model has been proposed in which the SDVAE model forms latent distribution for contextualized words and bidirectional attention mechanism extracts keywords and features from sentences thereby capturing polysemic words. Furthermore, the summary is obtained with dangling anaphora whereas antecedent morphological expression and verb referents are not considered in the summary. Hence a novel Multilayered Competitive Probable Modular Perception Model has been proposed in which the competitive layer scores the sentence and the scored sentences are ranked using string kernel and class conditional probability thereby considering the antecedent morphological expression and then, Graph based Quadruplicate Lexicon Summarization is used that forms quadruplicate lexicon chain in graphical format to eliminate dangling anaphoric expressions. The experimental results obtained show that the proposed model has achieved a comparatively high accuracy of 98.3% and recall, precision, and F-measure of 98%.

Abstract Image

查看原文本刊更多论文

用于提取式网络文本摘要的堆叠去噪变异自动编码器模型

从语料库中提取提炼内容并将其连接成摘要的技术被称为提取式摘要。近年来，由于社交媒体的广泛使用，网络文本的提取式摘要变得越来越流行。因此，人们对网络文本的抽取式摘要进行了各种研究，但由于需要大量的存储空间和时间，处理海量网络文本和理解上下文非常困难。为了解决这个问题，我们采用了连续词袋文本矢量化模型，通过产生矢量排列的分布式组合词来减少处理时间。此外，多义词无法被捕获，这给提取带来了困难。因此，我们提出了一种新颖的分层注意力指针堆叠去噪变异自动编码器模型，其中 SDVAE 模型为上下文关联词形成潜在分布，双向注意力机制从句子中提取关键词和特征，从而捕捉多义词。此外，摘要中还包含悬空拟词，而前置词形态表达和动词指代则不在摘要中考虑。因此，我们提出了一种新颖的多层竞争概率模块感知模型，其中竞争层对句子进行评分，评分后的句子使用字符串核和类条件概率进行排序，从而考虑到前置词形态表达，然后使用基于图形的四重词库摘要，以图形格式形成四重词库链，以消除悬空拟词表达。实验结果表明，所提出的模型达到了相对较高的准确率（98.3%），召回率、精确率和 F-measure 均为 98%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Iranian Journal of Science and Technology-Transactions of Electrical Engineering ENGINEERING, ELECTRICAL & ELECTRONIC-

CiteScore

5.50

自引率

4.20%

发文量

审稿时长

>12 weeks

期刊介绍： Transactions of Electrical Engineering is to foster the growth of scientific research in all branches of electrical engineering and its related grounds and to provide a medium by means of which the fruits of these researches may be brought to the attentionof the world’s scientific communities. The journal has the focus on the frontier topics in the theoretical, mathematical, numerical, experimental and scientific developments in electrical engineering as well as applications of established techniques to new domains in various electical engineering disciplines such as: Bio electric, Bio mechanics, Bio instrument, Microwaves, Wave Propagation, Communication Theory, Channel Estimation, radar & sonar system, Signal Processing, image processing, Artificial Neural Networks, Data Mining and Machine Learning, Fuzzy Logic and Systems, Fuzzy Control, Optimal & Robust ControlNavigation & Estimation Theory, Power Electronics & Drives, Power Generation & Management The editors will welcome papers from all professors and researchers from universities, research centers, organizations, companies and industries from all over the world in the hope that this will advance the scientific standards of the journal and provide a channel of communication between Iranian Scholars and their colleague in other parts of the world.