Effectively Leveraging BERT for Legal Document Classification

Nut Limsopatham
{"title":"Effectively Leveraging BERT for Legal Document Classification","authors":"Nut Limsopatham","doi":"10.18653/v1/2021.nllp-1.22","DOIUrl":null,"url":null,"abstract":"Bidirectional Encoder Representations from Transformers (BERT) has achieved state-of-the-art performances on several text classification tasks, such as GLUE and sentiment analysis. Recent work in the legal domain started to use BERT on tasks, such as legal judgement prediction and violation prediction. A common practise in using BERT is to fine-tune a pre-trained model on a target task and truncate the input texts to the size of the BERT input (e.g. at most 512 tokens). However, due to the unique characteristics of legal documents, it is not clear how to effectively adapt BERT in the legal domain. In this work, we investigate how to deal with long documents, and how is the importance of pre-training on documents from the same domain as the target task. We conduct experiments on the two recent datasets: ECHR Violation Dataset and the Overruling Task Dataset, which are multi-label and binary classification tasks, respectively. Importantly, on average the number of tokens in a document from the ECHR Violation Dataset is more than 1,600. While the documents in the Overruling Task Dataset are shorter (the maximum number of tokens is 204). We thoroughly compare several techniques for adapting BERT on long documents and compare different models pre-trained on the legal and other domains. Our experimental results show that we need to explicitly adapt BERT to handle long documents, as the truncation leads to less effective performance. We also found that pre-training on the documents that are similar to the target task would result in more effective performance on several scenario.","PeriodicalId":191237,"journal":{"name":"Proceedings of the Natural Legal Language Processing Workshop 2021","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Natural Legal Language Processing Workshop 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.nllp-1.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

Bidirectional Encoder Representations from Transformers (BERT) has achieved state-of-the-art performances on several text classification tasks, such as GLUE and sentiment analysis. Recent work in the legal domain started to use BERT on tasks, such as legal judgement prediction and violation prediction. A common practise in using BERT is to fine-tune a pre-trained model on a target task and truncate the input texts to the size of the BERT input (e.g. at most 512 tokens). However, due to the unique characteristics of legal documents, it is not clear how to effectively adapt BERT in the legal domain. In this work, we investigate how to deal with long documents, and how is the importance of pre-training on documents from the same domain as the target task. We conduct experiments on the two recent datasets: ECHR Violation Dataset and the Overruling Task Dataset, which are multi-label and binary classification tasks, respectively. Importantly, on average the number of tokens in a document from the ECHR Violation Dataset is more than 1,600. While the documents in the Overruling Task Dataset are shorter (the maximum number of tokens is 204). We thoroughly compare several techniques for adapting BERT on long documents and compare different models pre-trained on the legal and other domains. Our experimental results show that we need to explicitly adapt BERT to handle long documents, as the truncation leads to less effective performance. We also found that pre-training on the documents that are similar to the target task would result in more effective performance on several scenario.
有效利用BERT进行法律文件分类
来自变形金刚的双向编码器表示(BERT)在一些文本分类任务上取得了最先进的性能,例如GLUE和情感分析。最近在法律领域的工作开始使用BERT来完成法律判决预测和违法行为预测等任务。使用BERT的常见做法是对目标任务上的预训练模型进行微调,并将输入文本截断到BERT输入的大小(例如最多512个令牌)。然而,由于法律文书的独特性,如何将BERT有效地应用于法律领域尚不明确。在这项工作中,我们研究了如何处理长文档,以及对来自与目标任务相同领域的文档进行预训练的重要性。我们在两个最新的数据集上进行了实验:ECHR违规数据集和Overruling Task数据集,这两个数据集分别是多标签和二值分类任务。重要的是,ECHR违规数据集文件中的令牌平均数量超过1600个。而Overruling Task Dataset中的文档更短(令牌的最大数量是204)。我们全面比较了在长文档上使用BERT的几种技术,并比较了在法律和其他领域预训练的不同模型。我们的实验结果表明,我们需要显式地调整BERT来处理长文档,因为截断会导致效率降低。我们还发现,在几个场景中,对与目标任务相似的文档进行预训练会产生更有效的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信