用于多文档摘要的定制化长短时记忆架构，具有改进的文本特征集

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering Pub Date : 2025-03-25 DOI:10.1016/j.datak.2025.102440

Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik

{"title":"用于多文档摘要的定制化长短时记忆架构，具有改进的文本特征集","authors":"Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik","doi":"10.1016/j.datak.2025.102440","DOIUrl":null,"url":null,"abstract":"<div><div>One <strong>a</strong>mong the most crucial concerns in the domain of Natural Language Processing (NLP) is the Multi-Document Summarization (MDS) and in recent decades, the focus on this issue has risen massively. Hence, it is vital for the NLP community to provide effective and reliable MDS methods. Current deep learning-dependent MDS techniques rely on the extraordinary capacity of neural networks, in order to extract distinctive features. Motivated by this fact, we introduce a novel MDS technique, named as Customized Long Short-Term Memory-based Multi-Document Summarization using IBi-GRU <strong>(</strong>CLSTM-MDS+IBi-GRU), which includes the following working processes. Firstly, the input data gets converted into tokens by the Bi-directional Transformer (BERT) tokenizer. The features, such as Term Frequency- Inverse Document Frequency (TF-IDF), Bag of Words (BoW), thematic features and an improved aspect term-based feature are then extracted afterwards. Finally, the summarization process takes place by utilizing the concatenation of Customized Long Short-Term Memory (CLSTM) with a pre-eminent layer. Accurate and high-quality summary is provided via introducing this layer in the LSTM module and the Bi-GRU-based Inception module (IBi-GRU), which can capture long range dependences through parallel convolution. The outcomes of this work prove the superiority of our CLSTM-MDS in the Multi-Document Summarization task.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102440"},"PeriodicalIF":2.7000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Customized long short-term memory architecture for multi-document summarization with improved text feature set\",\"authors\":\"Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik\",\"doi\":\"10.1016/j.datak.2025.102440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>One <strong>a</strong>mong the most crucial concerns in the domain of Natural Language Processing (NLP) is the Multi-Document Summarization (MDS) and in recent decades, the focus on this issue has risen massively. Hence, it is vital for the NLP community to provide effective and reliable MDS methods. Current deep learning-dependent MDS techniques rely on the extraordinary capacity of neural networks, in order to extract distinctive features. Motivated by this fact, we introduce a novel MDS technique, named as Customized Long Short-Term Memory-based Multi-Document Summarization using IBi-GRU <strong>(</strong>CLSTM-MDS+IBi-GRU), which includes the following working processes. Firstly, the input data gets converted into tokens by the Bi-directional Transformer (BERT) tokenizer. The features, such as Term Frequency- Inverse Document Frequency (TF-IDF), Bag of Words (BoW), thematic features and an improved aspect term-based feature are then extracted afterwards. Finally, the summarization process takes place by utilizing the concatenation of Customized Long Short-Term Memory (CLSTM) with a pre-eminent layer. Accurate and high-quality summary is provided via introducing this layer in the LSTM module and the Bi-GRU-based Inception module (IBi-GRU), which can capture long range dependences through parallel convolution. The outcomes of this work prove the superiority of our CLSTM-MDS in the Multi-Document Summarization task.</div></div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":\"159 \",\"pages\":\"Article 102440\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X25000357\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000357","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

自然语言处理（NLP）领域中最重要的问题之一是多文档摘要（MDS），近几十年来，这一问题得到了广泛关注。因此，提供有效可靠的MDS方法对NLP社区至关重要。当前依赖深度学习的MDS技术依赖于神经网络的非凡能力，以提取显著特征。基于此，我们提出了一种新的基于IBi-GRU的基于定制长短期记忆的多文档摘要技术（CLSTM-MDS+IBi-GRU），包括以下工作流程。首先，输入数据通过双向转换器（BERT）标记器转换为标记。然后提取词频-逆文档频率（TF-IDF）、词包（BoW）、主题特征和改进的基于词的方面特征。最后，总结过程通过利用自定义长短期记忆（CLSTM）与卓越层的连接进行。通过在LSTM模块和基于bi - gru的Inception模块（IBi-GRU）中引入该层，可以提供准确和高质量的摘要，该模块可以通过并行卷积捕获远程依赖关系。研究结果证明了我们的CLSTM-MDS在多文档摘要任务中的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Customized long short-term memory architecture for multi-document summarization with improved text feature set

One among the most crucial concerns in the domain of Natural Language Processing (NLP) is the Multi-Document Summarization (MDS) and in recent decades, the focus on this issue has risen massively. Hence, it is vital for the NLP community to provide effective and reliable MDS methods. Current deep learning-dependent MDS techniques rely on the extraordinary capacity of neural networks, in order to extract distinctive features. Motivated by this fact, we introduce a novel MDS technique, named as Customized Long Short-Term Memory-based Multi-Document Summarization using IBi-GRU (CLSTM-MDS+IBi-GRU), which includes the following working processes. Firstly, the input data gets converted into tokens by the Bi-directional Transformer (BERT) tokenizer. The features, such as Term Frequency- Inverse Document Frequency (TF-IDF), Bag of Words (BoW), thematic features and an improved aspect term-based feature are then extracted afterwards. Finally, the summarization process takes place by utilizing the concatenation of Customized Long Short-Term Memory (CLSTM) with a pre-eminent layer. Accurate and high-quality summary is provided via introducing this layer in the LSTM module and the Bi-GRU-based Inception module (IBi-GRU), which can capture long range dependences through parallel convolution. The outcomes of this work prove the superiority of our CLSTM-MDS in the Multi-Document Summarization task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.