Hierarchical Neural Network Approaches for Long Document Classification

2022 14th International Conference on Machine Learning and Computing (ICMLC) Pub Date : 2022-01-18 DOI:10.1145/3529836.3529935

Snehal Khandve, Vedangi Wagh, Apurva Wani, Isha Joshi, Raviraj Joshi

{"title":"Hierarchical Neural Network Approaches for Long Document Classification","authors":"Snehal Khandve, Vedangi Wagh, Apurva Wani, Isha Joshi, Raviraj Joshi","doi":"10.1145/3529836.3529935","DOIUrl":null,"url":null,"abstract":"Text classification algorithms investigate the intricate relationships between words or phrases and attempt to deduce the document’s interpretation. In the last few years, these algorithms have progressed tremendously. Transformer architecture and sentence encoders have proven to give superior results on natural language processing tasks. But a major limitation of these architectures is their applicability for text no longer than a few hundred words. In this paper, we explore hierarchical transfer learning approaches for long document classification. We employ pre-trained Universal Sentence Encoder (USE) and Bidirectional Encoder Representations from Transformers (BERT) in a hierarchical setup to capture better representations efficiently. Our proposed models are conceptually simple where we divide the input data into chunks and then pass this through base models of BERT and USE. Then output representation for each chunk is then propagated through a shallow neural network comprising of LSTMs or CNNs for classifying the text data. These extensions are evaluated on 6 benchmark datasets. We show that USE + CNN/LSTM performs better than its stand-alone baseline. Whereas the BERT + CNN/LSTM performs on par with its stand-alone counterpart. However, the hierarchical BERT models are still desirable as it avoids the quadratic complexity of the attention mechanism in BERT. Along with the hierarchical approaches, this work also provides a comparison of different deep learning algorithms like USE, BERT, HAN, Longformer, and BigBird for long document classification. The Longformer approach consistently performs well on most of the datasets.","PeriodicalId":285191,"journal":{"name":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529836.3529935","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Text classification algorithms investigate the intricate relationships between words or phrases and attempt to deduce the document’s interpretation. In the last few years, these algorithms have progressed tremendously. Transformer architecture and sentence encoders have proven to give superior results on natural language processing tasks. But a major limitation of these architectures is their applicability for text no longer than a few hundred words. In this paper, we explore hierarchical transfer learning approaches for long document classification. We employ pre-trained Universal Sentence Encoder (USE) and Bidirectional Encoder Representations from Transformers (BERT) in a hierarchical setup to capture better representations efficiently. Our proposed models are conceptually simple where we divide the input data into chunks and then pass this through base models of BERT and USE. Then output representation for each chunk is then propagated through a shallow neural network comprising of LSTMs or CNNs for classifying the text data. These extensions are evaluated on 6 benchmark datasets. We show that USE + CNN/LSTM performs better than its stand-alone baseline. Whereas the BERT + CNN/LSTM performs on par with its stand-alone counterpart. However, the hierarchical BERT models are still desirable as it avoids the quadratic complexity of the attention mechanism in BERT. Along with the hierarchical approaches, this work also provides a comparison of different deep learning algorithms like USE, BERT, HAN, Longformer, and BigBird for long document classification. The Longformer approach consistently performs well on most of the datasets.

查看原文本刊更多论文

长文档分类的层次神经网络方法

文本分类算法研究单词或短语之间的复杂关系，并试图推断文档的解释。在过去的几年里，这些算法取得了巨大的进步。Transformer架构和句子编码器已被证明在自然语言处理任务上提供了优越的结果。但是这些架构的一个主要限制是它们对不超过几百个单词的文本的适用性。在本文中，我们探索了用于长文档分类的分层迁移学习方法。我们在分层设置中使用预训练的通用句子编码器(USE)和来自变形金刚的双向编码器表示(BERT)来有效地捕获更好的表示。我们提出的模型在概念上很简单，我们将输入数据分成块，然后将其传递给BERT和USE的基本模型。然后通过由lstm或cnn组成的浅层神经网络传播每个块的输出表示，用于对文本数据进行分类。这些扩展在6个基准数据集上进行了评估。我们证明USE + CNN/LSTM比其独立基线性能更好。而BERT + CNN/LSTM的性能与独立的同类产品相当。然而，层次BERT模型仍然是可取的，因为它避免了BERT中注意机制的二次复杂性。除了分层方法，这项工作还提供了不同深度学习算法的比较，如USE、BERT、HAN、Longformer和BigBird，用于长文档分类。Longformer方法在大多数数据集上始终表现良好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 14th International Conference on Machine Learning and Computing (ICMLC)

自引率

0.00%

发文量