Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity

Proceedings of the Natural Legal Language Processing Workshop 2021 Pub Date : 1900-01-01 DOI:10.18653/v1/2021.nllp-1.12

Li Tang, S. Clematide

{"title":"Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity","authors":"Li Tang, S. Clematide","doi":"10.18653/v1/2021.nllp-1.12","DOIUrl":null,"url":null,"abstract":"Searching for legal documents is a specialized Information Retrieval task that is relevant for expert users (lawyers and their assistants) and for non-expert users. By searching previous court decisions (cases), a user can better prepare the legal reasoning of a new case. Being able to search using a natural language text snippet instead of a more artificial query could help to prevent query formulation issues. Also, if semantic similarity could be modeled beyond exact lexical matches, more relevant results can be found even if the query terms don’t match exactly. For this domain, we formulated a task to compare different ways of modeling semantic similarity at paragraph level, using neural and non-neural systems. We compared systems that encode the query and the search collection paragraphs as vectors, enabling the use of cosine similarity for results ranking. After building a German dataset for cases and statutes from Switzerland, and extracting citations from cases to statutes, we developed an algorithm for estimating semantic similarity at paragraph level, using a link-based similarity method. When evaluating different systems in this way, we find that semantic similarity modeling by neural systems can be boosted with an extended attention mask that quenches noise in the inputs.","PeriodicalId":191237,"journal":{"name":"Proceedings of the Natural Legal Language Processing Workshop 2021","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Natural Legal Language Processing Workshop 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.nllp-1.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Searching for legal documents is a specialized Information Retrieval task that is relevant for expert users (lawyers and their assistants) and for non-expert users. By searching previous court decisions (cases), a user can better prepare the legal reasoning of a new case. Being able to search using a natural language text snippet instead of a more artificial query could help to prevent query formulation issues. Also, if semantic similarity could be modeled beyond exact lexical matches, more relevant results can be found even if the query terms don’t match exactly. For this domain, we formulated a task to compare different ways of modeling semantic similarity at paragraph level, using neural and non-neural systems. We compared systems that encode the query and the search collection paragraphs as vectors, enabling the use of cosine similarity for results ranking. After building a German dataset for cases and statutes from Switzerland, and extracting citations from cases to statutes, we developed an algorithm for estimating semantic similarity at paragraph level, using a link-based similarity method. When evaluating different systems in this way, we find that semantic similarity modeling by neural systems can be boosted with an extended attention mask that quenches noise in the inputs.

查看原文本刊更多论文

在段落级别搜索法律文件:自动标签生成和使用扩展注意掩码来增强语义相似度的神经模型

搜索法律文件是一项专门的信息检索任务，它与专家用户(律师及其助手)和非专家用户相关。通过搜索以前的法院判决(案件)，用户可以更好地准备新案件的法律推理。能够使用自然语言文本片段而不是更人工的查询进行搜索可以帮助防止查询公式问题。此外，如果语义相似性可以在精确的词汇匹配之外建模，那么即使查询词不完全匹配，也可以找到更相关的结果。对于这个领域，我们制定了一个任务来比较使用神经和非神经系统在段落级别建模语义相似性的不同方法。我们比较了将查询和搜索集合段落编码为向量的系统，从而可以使用余弦相似度进行结果排序。在建立了瑞士案例和法规的德语数据集，并提取了案例对法规的引用之后，我们开发了一种算法，使用基于链接的相似度方法来估计段落级别的语义相似度。当以这种方式评估不同的系统时，我们发现神经系统的语义相似性建模可以通过扩展的注意掩模来增强，该掩模可以消除输入中的噪声。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Natural Legal Language Processing Workshop 2021

自引率

0.00%

发文量