Proceedings of the Natural Legal Language Processing Workshop 2021最新文献

筛选
英文 中文
Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity 在段落级别搜索法律文件:自动标签生成和使用扩展注意掩码来增强语义相似度的神经模型
Proceedings of the Natural Legal Language Processing Workshop 2021 Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.nllp-1.12
Li Tang, S. Clematide
{"title":"Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity","authors":"Li Tang, S. Clematide","doi":"10.18653/v1/2021.nllp-1.12","DOIUrl":"https://doi.org/10.18653/v1/2021.nllp-1.12","url":null,"abstract":"Searching for legal documents is a specialized Information Retrieval task that is relevant for expert users (lawyers and their assistants) and for non-expert users. By searching previous court decisions (cases), a user can better prepare the legal reasoning of a new case. Being able to search using a natural language text snippet instead of a more artificial query could help to prevent query formulation issues. Also, if semantic similarity could be modeled beyond exact lexical matches, more relevant results can be found even if the query terms don’t match exactly. For this domain, we formulated a task to compare different ways of modeling semantic similarity at paragraph level, using neural and non-neural systems. We compared systems that encode the query and the search collection paragraphs as vectors, enabling the use of cosine similarity for results ranking. After building a German dataset for cases and statutes from Switzerland, and extracting citations from cases to statutes, we developed an algorithm for estimating semantic similarity at paragraph level, using a link-based similarity method. When evaluating different systems in this way, we find that semantic similarity modeling by neural systems can be boosted with an extended attention mask that quenches noise in the inputs.","PeriodicalId":191237,"journal":{"name":"Proceedings of the Natural Legal Language Processing Workshop 2021","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115891691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Effectively Leveraging BERT for Legal Document Classification 有效利用BERT进行法律文件分类
Proceedings of the Natural Legal Language Processing Workshop 2021 Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.nllp-1.22
Nut Limsopatham
{"title":"Effectively Leveraging BERT for Legal Document Classification","authors":"Nut Limsopatham","doi":"10.18653/v1/2021.nllp-1.22","DOIUrl":"https://doi.org/10.18653/v1/2021.nllp-1.22","url":null,"abstract":"Bidirectional Encoder Representations from Transformers (BERT) has achieved state-of-the-art performances on several text classification tasks, such as GLUE and sentiment analysis. Recent work in the legal domain started to use BERT on tasks, such as legal judgement prediction and violation prediction. A common practise in using BERT is to fine-tune a pre-trained model on a target task and truncate the input texts to the size of the BERT input (e.g. at most 512 tokens). However, due to the unique characteristics of legal documents, it is not clear how to effectively adapt BERT in the legal domain. In this work, we investigate how to deal with long documents, and how is the importance of pre-training on documents from the same domain as the target task. We conduct experiments on the two recent datasets: ECHR Violation Dataset and the Overruling Task Dataset, which are multi-label and binary classification tasks, respectively. Importantly, on average the number of tokens in a document from the ECHR Violation Dataset is more than 1,600. While the documents in the Overruling Task Dataset are shorter (the maximum number of tokens is 204). We thoroughly compare several techniques for adapting BERT on long documents and compare different models pre-trained on the legal and other domains. Our experimental results show that we need to explicitly adapt BERT to handle long documents, as the truncation leads to less effective performance. We also found that pre-training on the documents that are similar to the target task would result in more effective performance on several scenario.","PeriodicalId":191237,"journal":{"name":"Proceedings of the Natural Legal Language Processing Workshop 2021","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128438981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Few-shot and Zero-shot Approaches to Legal Text Classification: A Case Study in the Financial Sector 法律文本分类的少射与零射方法:以金融业为例
Proceedings of the Natural Legal Language Processing Workshop 2021 Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.nllp-1.10
Rajdeep Sarkar, Atul Kr. Ojha, Jay Megaro, J. Mariano, Vall Herard, John P. Mccrae
{"title":"Few-shot and Zero-shot Approaches to Legal Text Classification: A Case Study in the Financial Sector","authors":"Rajdeep Sarkar, Atul Kr. Ojha, Jay Megaro, J. Mariano, Vall Herard, John P. Mccrae","doi":"10.18653/v1/2021.nllp-1.10","DOIUrl":"https://doi.org/10.18653/v1/2021.nllp-1.10","url":null,"abstract":"The application of predictive coding techniques to legal texts has the potential to greatly reduce the cost of legal review of documents, however, there is such a wide array of legal tasks and continuously evolving legislation that it is hard to construct sufficient training data to cover all cases. In this paper, we investigate few-shot and zero-shot approaches that require substantially less training data and introduce a triplet architecture, which for promissory statements produces performance close to that of a supervised system. This method allows predictive coding methods to be rapidly developed for new regulations and markets.","PeriodicalId":191237,"journal":{"name":"Proceedings of the Natural Legal Language Processing Workshop 2021","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114545874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Automating Claim Construction in Patent Applications: The CMUmine Dataset 专利申请中的自动权利要求构建:CMUmine数据集
Proceedings of the Natural Legal Language Processing Workshop 2021 Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.nllp-1.21
O. Tonguz, Yiwei Qin, Yimeng Gu, Hyun Hannah Moon
{"title":"Automating Claim Construction in Patent Applications: The CMUmine Dataset","authors":"O. Tonguz, Yiwei Qin, Yimeng Gu, Hyun Hannah Moon","doi":"10.18653/v1/2021.nllp-1.21","DOIUrl":"https://doi.org/10.18653/v1/2021.nllp-1.21","url":null,"abstract":"Intellectual Property (IP) in the form of issued patents is a critical and very desirable element of innovation in high-tech. In this position paper, we explore the possibility of automating the legal task of Claim Construction in patent applications via Natural Language Processing (NLP) and Machine Learning (ML). To this end, we first create a large dataset known as CMUmine™and then demonstrate that, using NLP and ML techniques the Claim Construction in patent applications, a crucial legal task currently performed by IP attorneys, can be automated. To the best of our knowledge, this is the first public patent application dataset. Our results look very promising in automating the patent application process.","PeriodicalId":191237,"journal":{"name":"Proceedings of the Natural Legal Language Processing Workshop 2021","volume":"03 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129243419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信