Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik
{"title":"用于多文档摘要的定制化长短时记忆架构,具有改进的文本特征集","authors":"Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik","doi":"10.1016/j.datak.2025.102440","DOIUrl":null,"url":null,"abstract":"<div><div>One <strong>a</strong>mong the most crucial concerns in the domain of Natural Language Processing (NLP) is the Multi-Document Summarization (MDS) and in recent decades, the focus on this issue has risen massively. Hence, it is vital for the NLP community to provide effective and reliable MDS methods. Current deep learning-dependent MDS techniques rely on the extraordinary capacity of neural networks, in order to extract distinctive features. Motivated by this fact, we introduce a novel MDS technique, named as Customized Long Short-Term Memory-based Multi-Document Summarization using IBi-GRU <strong>(</strong>CLSTM-MDS+IBi-GRU), which includes the following working processes. Firstly, the input data gets converted into tokens by the Bi-directional Transformer (BERT) tokenizer. The features, such as Term Frequency- Inverse Document Frequency (TF-IDF), Bag of Words (BoW), thematic features and an improved aspect term-based feature are then extracted afterwards. Finally, the summarization process takes place by utilizing the concatenation of Customized Long Short-Term Memory (CLSTM) with a pre-eminent layer. Accurate and high-quality summary is provided via introducing this layer in the LSTM module and the Bi-GRU-based Inception module (IBi-GRU), which can capture long range dependences through parallel convolution. The outcomes of this work prove the superiority of our CLSTM-MDS in the Multi-Document Summarization task.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102440"},"PeriodicalIF":2.7000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Customized long short-term memory architecture for multi-document summarization with improved text feature set\",\"authors\":\"Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik\",\"doi\":\"10.1016/j.datak.2025.102440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>One <strong>a</strong>mong the most crucial concerns in the domain of Natural Language Processing (NLP) is the Multi-Document Summarization (MDS) and in recent decades, the focus on this issue has risen massively. Hence, it is vital for the NLP community to provide effective and reliable MDS methods. Current deep learning-dependent MDS techniques rely on the extraordinary capacity of neural networks, in order to extract distinctive features. Motivated by this fact, we introduce a novel MDS technique, named as Customized Long Short-Term Memory-based Multi-Document Summarization using IBi-GRU <strong>(</strong>CLSTM-MDS+IBi-GRU), which includes the following working processes. Firstly, the input data gets converted into tokens by the Bi-directional Transformer (BERT) tokenizer. The features, such as Term Frequency- Inverse Document Frequency (TF-IDF), Bag of Words (BoW), thematic features and an improved aspect term-based feature are then extracted afterwards. Finally, the summarization process takes place by utilizing the concatenation of Customized Long Short-Term Memory (CLSTM) with a pre-eminent layer. Accurate and high-quality summary is provided via introducing this layer in the LSTM module and the Bi-GRU-based Inception module (IBi-GRU), which can capture long range dependences through parallel convolution. The outcomes of this work prove the superiority of our CLSTM-MDS in the Multi-Document Summarization task.</div></div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":\"159 \",\"pages\":\"Article 102440\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X25000357\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000357","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Customized long short-term memory architecture for multi-document summarization with improved text feature set
One among the most crucial concerns in the domain of Natural Language Processing (NLP) is the Multi-Document Summarization (MDS) and in recent decades, the focus on this issue has risen massively. Hence, it is vital for the NLP community to provide effective and reliable MDS methods. Current deep learning-dependent MDS techniques rely on the extraordinary capacity of neural networks, in order to extract distinctive features. Motivated by this fact, we introduce a novel MDS technique, named as Customized Long Short-Term Memory-based Multi-Document Summarization using IBi-GRU (CLSTM-MDS+IBi-GRU), which includes the following working processes. Firstly, the input data gets converted into tokens by the Bi-directional Transformer (BERT) tokenizer. The features, such as Term Frequency- Inverse Document Frequency (TF-IDF), Bag of Words (BoW), thematic features and an improved aspect term-based feature are then extracted afterwards. Finally, the summarization process takes place by utilizing the concatenation of Customized Long Short-Term Memory (CLSTM) with a pre-eminent layer. Accurate and high-quality summary is provided via introducing this layer in the LSTM module and the Bi-GRU-based Inception module (IBi-GRU), which can capture long range dependences through parallel convolution. The outcomes of this work prove the superiority of our CLSTM-MDS in the Multi-Document Summarization task.
期刊介绍:
Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.