Proceedings of the Natural Legal Language Processing Workshop 2021最新文献

Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity 在段落级别搜索法律文件:自动标签生成和使用扩展注意掩码来增强语义相似度的神经模型

Proceedings of the Natural Legal Language Processing Workshop 2021 Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.nllp-1.12

Li Tang, S. Clematide

{"title":"Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity","authors":"Li Tang, S. Clematide","doi":"10.18653/v1/2021.nllp-1.12","DOIUrl":"https://doi.org/10.18653/v1/2021.nllp-1.12","url":null,"abstract":"Searching for legal documents is a specialized Information Retrieval task that is relevant for expert users (lawyers and their assistants) and for non-expert users. By searching previous court decisions (cases), a user can better prepare the legal reasoning of a new case. Being able to search using a natural language text snippet instead of a more artificial query could help to prevent query formulation issues. Also, if semantic similarity could be modeled beyond exact lexical matches, more relevant results can be found even if the query terms don’t match exactly. For this domain, we formulated a task to compare different ways of modeling semantic similarity at paragraph level, using neural and non-neural systems. We compared systems that encode the query and the search collection paragraphs as vectors, enabling the use of cosine similarity for results ranking. After building a German dataset for cases and statutes from Switzerland, and extracting citations from cases to statutes, we developed an algorithm for estimating semantic similarity at paragraph level, using a link-based similarity method. When evaluating different systems in this way, we find that semantic similarity modeling by neural systems can be boosted with an extended attention mask that quenches noise in the inputs.","PeriodicalId":191237,"journal":{"name":"Proceedings of the Natural Legal Language Processing Workshop 2021","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115891691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Effectively Leveraging BERT for Legal Document Classification 有效利用BERT进行法律文件分类

Proceedings of the Natural Legal Language Processing Workshop 2021 Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.nllp-1.22

Nut Limsopatham

{"title":"Effectively Leveraging BERT for Legal Document Classification","authors":"Nut Limsopatham","doi":"10.18653/v1/2021.nllp-1.22","DOIUrl":"https://doi.org/10.18653/v1/2021.nllp-1.22","url":null,"abstract":"Bidirectional Encoder Representations from Transformers (BERT) has achieved state-of-the-art performances on several text classification tasks, such as GLUE and sentiment analysis. Recent work in the legal domain started to use BERT on tasks, such as legal judgement prediction and violation prediction. A common practise in using BERT is to fine-tune a pre-trained model on a target task and truncate the input texts to the size of the BERT input (e.g. at most 512 tokens). However, due to the unique characteristics of legal documents, it is not clear how to effectively adapt BERT in the legal domain. In this work, we investigate how to deal with long documents, and how is the importance of pre-training on documents from the same domain as the target task. We conduct experiments on the two recent datasets: ECHR Violation Dataset and the Overruling Task Dataset, which are multi-label and binary classification tasks, respectively. Importantly, on average the number of tokens in a document from the ECHR Violation Dataset is more than 1,600. While the documents in the Overruling Task Dataset are shorter (the maximum number of tokens is 204). We thoroughly compare several techniques for adapting BERT on long documents and compare different models pre-trained on the legal and other domains. Our experimental results show that we need to explicitly adapt BERT to handle long documents, as the truncation leads to less effective performance. We also found that pre-training on the documents that are similar to the target task would result in more effective performance on several scenario.","PeriodicalId":191237,"journal":{"name":"Proceedings of the Natural Legal Language Processing Workshop 2021","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128438981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Few-shot and Zero-shot Approaches to Legal Text Classification: A Case Study in the Financial Sector 法律文本分类的少射与零射方法:以金融业为例

Proceedings of the Natural Legal Language Processing Workshop 2021 Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.nllp-1.10

Rajdeep Sarkar, Atul Kr. Ojha, Jay Megaro, J. Mariano, Vall Herard, John P. Mccrae

引用次数: 7

Automating Claim Construction in Patent Applications: The CMUmine Dataset 专利申请中的自动权利要求构建:CMUmine数据集

Proceedings of the Natural Legal Language Processing Workshop 2021 Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.nllp-1.21

O. Tonguz, Yiwei Qin, Yimeng Gu, Hyun Hannah Moon

引用次数: 2