Recurrent Attention Networks for Long-text Modeling

Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-06-12 DOI:10.48550/arXiv.2306.06843

Xianming Li, Zongxi Li, Xiaotian Luo, Haoran Xie, Xing Lee, Yingbin Zhao, Fu Lee Wang, Qing Li

{"title":"Recurrent Attention Networks for Long-text Modeling","authors":"Xianming Li, Zongxi Li, Xiaotian Luo, Haoran Xie, Xing Lee, Yingbin Zhao, Fu Lee Wang, Qing Li","doi":"10.48550/arXiv.2306.06843","DOIUrl":null,"url":null,"abstract":"Self-attention-based models have achieved remarkable progress in short-text mining. However, the quadratic computational complexities restrict their application in long text processing. Prior works have adopted the chunking strategy to divide long documents into chunks and stack a self-attention backbone with the recurrent structure to extract semantic representation. Such an approach disables parallelization of the attention mechanism, significantly increasing the training cost and raising hardware requirements. Revisiting the self-attention mechanism and the recurrent structure, this paper proposes a novel long-document encoding model, Recurrent Attention Network (RAN), to enable the recurrent operation of self-attention. Combining the advantages from both sides, the well-designed RAN is capable of extracting global semantics in both token-level and document-level representations, making it inherently compatible with both sequential and classification tasks, respectively. Furthermore, RAN is computationally scalable as it supports parallelization on long document processing. Extensive experiments demonstrate the long-text encoding ability of the proposed RAN model on both classification and sequential tasks, showing its potential for a wide range of applications.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Meeting of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.06843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Self-attention-based models have achieved remarkable progress in short-text mining. However, the quadratic computational complexities restrict their application in long text processing. Prior works have adopted the chunking strategy to divide long documents into chunks and stack a self-attention backbone with the recurrent structure to extract semantic representation. Such an approach disables parallelization of the attention mechanism, significantly increasing the training cost and raising hardware requirements. Revisiting the self-attention mechanism and the recurrent structure, this paper proposes a novel long-document encoding model, Recurrent Attention Network (RAN), to enable the recurrent operation of self-attention. Combining the advantages from both sides, the well-designed RAN is capable of extracting global semantics in both token-level and document-level representations, making it inherently compatible with both sequential and classification tasks, respectively. Furthermore, RAN is computationally scalable as it supports parallelization on long document processing. Extensive experiments demonstrate the long-text encoding ability of the proposed RAN model on both classification and sequential tasks, showing its potential for a wide range of applications.

查看原文本刊更多论文

用于长文本建模的循环注意网络

基于自注意的模型在短文本挖掘方面取得了显著进展。然而，二次计算的复杂性限制了其在长文本处理中的应用。先前的研究采用分块策略，将长文档分成块，并将自关注主干与循环结构叠加，以提取语义表示。这种方法禁用了注意力机制的并行化，显著增加了训练成本并提高了硬件需求。本文在回顾自注意机制和循环结构的基础上，提出了一种新的长文档编码模型——循环注意网络(recurrent Attention Network, RAN)，以实现自注意的循环运行。结合双方的优势，设计良好的RAN能够在令牌级和文档级表示中提取全局语义，从而使其分别与顺序任务和分类任务固有地兼容。此外，RAN在计算上是可扩展的，因为它支持长文档处理的并行化。大量的实验证明了所提出的RAN模型在分类和顺序任务上的长文本编码能力，显示了其广泛应用的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annual Meeting of the Association for Computational Linguistics

自引率

0.00%

发文量