{"title":"Stacked Denoising Variational Auto Encoder Model for Extractive Web Text Summarization","authors":"Madhuri Yadav, Rahul Katarya","doi":"10.1007/s40998-024-00751-9","DOIUrl":null,"url":null,"abstract":"<p>Extracting and concatenating distilled content from a corpus into a summary is a technique known as extractive summarization. In recent days, extractive summarization of web text has become popular due to the wide usage of social media. Hence various researches have been conducted on extractive summarization of web text, but the processing of huge amounts of web text and understanding the context is difficult due to the requirement of a lot of storage and time. To solve this issue, the continuous bag of words text vectorization model has been used that reduce the processing time by producing a distributed combination of words in vector arrangement. Moreover, the polysemous words are unable to be captured, which makes extraction difficult. Hence a novel Hierarchical Attention pointer Stacked Denoising Variational Autoencoder Model has been proposed in which the SDVAE model forms latent distribution for contextualized words and bidirectional attention mechanism extracts keywords and features from sentences thereby capturing polysemic words. Furthermore, the summary is obtained with dangling anaphora whereas antecedent morphological expression and verb referents are not considered in the summary. Hence a novel Multilayered Competitive Probable Modular Perception Model has been proposed in which the competitive layer scores the sentence and the scored sentences are ranked using string kernel and class conditional probability thereby considering the antecedent morphological expression and then, Graph based Quadruplicate Lexicon Summarization is used that forms quadruplicate lexicon chain in graphical format to eliminate dangling anaphoric expressions. The experimental results obtained show that the proposed model has achieved a comparatively high accuracy of 98.3% and recall, precision, and F-measure of 98%.</p>","PeriodicalId":49064,"journal":{"name":"Iranian Journal of Science and Technology-Transactions of Electrical Engineering","volume":"43 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Iranian Journal of Science and Technology-Transactions of Electrical Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s40998-024-00751-9","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Extracting and concatenating distilled content from a corpus into a summary is a technique known as extractive summarization. In recent days, extractive summarization of web text has become popular due to the wide usage of social media. Hence various researches have been conducted on extractive summarization of web text, but the processing of huge amounts of web text and understanding the context is difficult due to the requirement of a lot of storage and time. To solve this issue, the continuous bag of words text vectorization model has been used that reduce the processing time by producing a distributed combination of words in vector arrangement. Moreover, the polysemous words are unable to be captured, which makes extraction difficult. Hence a novel Hierarchical Attention pointer Stacked Denoising Variational Autoencoder Model has been proposed in which the SDVAE model forms latent distribution for contextualized words and bidirectional attention mechanism extracts keywords and features from sentences thereby capturing polysemic words. Furthermore, the summary is obtained with dangling anaphora whereas antecedent morphological expression and verb referents are not considered in the summary. Hence a novel Multilayered Competitive Probable Modular Perception Model has been proposed in which the competitive layer scores the sentence and the scored sentences are ranked using string kernel and class conditional probability thereby considering the antecedent morphological expression and then, Graph based Quadruplicate Lexicon Summarization is used that forms quadruplicate lexicon chain in graphical format to eliminate dangling anaphoric expressions. The experimental results obtained show that the proposed model has achieved a comparatively high accuracy of 98.3% and recall, precision, and F-measure of 98%.
期刊介绍:
Transactions of Electrical Engineering is to foster the growth of scientific research in all branches of electrical engineering and its related grounds and to provide a medium by means of which the fruits of these researches may be brought to the attentionof the world’s scientific communities.
The journal has the focus on the frontier topics in the theoretical, mathematical, numerical, experimental and scientific developments in electrical engineering as well
as applications of established techniques to new domains in various electical engineering disciplines such as:
Bio electric, Bio mechanics, Bio instrument, Microwaves, Wave Propagation, Communication Theory, Channel Estimation, radar & sonar system, Signal Processing, image processing, Artificial Neural Networks, Data Mining and Machine Learning, Fuzzy Logic and Systems, Fuzzy Control, Optimal & Robust ControlNavigation & Estimation Theory, Power Electronics & Drives, Power Generation & Management The editors will welcome papers from all professors and researchers from universities, research centers,
organizations, companies and industries from all over the world in the hope that this will advance the scientific standards of the journal and provide a channel of communication between Iranian Scholars and their colleague in other parts of the world.