Unsupervised Legal Evidence Retrieval via Contrastive Learning with Approximate Aggregated Positive

Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence Pub Date : 2023-06-26 DOI:10.1609/aaai.v37i4.25603

Feng Yao, Jingyuan Zhang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Yun Liu, Weixing Shen

{"title":"Unsupervised Legal Evidence Retrieval via Contrastive Learning with Approximate Aggregated Positive","authors":"Feng Yao, Jingyuan Zhang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Yun Liu, Weixing Shen","doi":"10.1609/aaai.v37i4.25603","DOIUrl":null,"url":null,"abstract":"Verifying the facts alleged by the prosecutors before the trial requires the judges to retrieve evidence within the massive materials accompanied.\nExisting Legal AI applications often assume the facts are already determined and fail to notice the difficulty of reconstructing them. To build a practical Legal AI application and free the judges from the manually searching work, we introduce the task of Legal Evidence Retrieval, which aims at automatically retrieving the precise fact-related verbal evidence within a single case. We formulate the task in a dense retrieval paradigm, and jointly learn the constrastive representations and alignments between facts and evidence. To get rid of the tedious annotations, we construct an approximated positive vector for a given fact by aggregating a set of evidence from the same case. An entropy-based denoise technique is further applied to mitigate the impact of false positive samples. We train our models on tens of thousands of unlabeled cases and evaluate them on a labeled dataset containing 919 cases and 4,336 queries. Experimental results indicate that our approach is effective and outperforms other state-of-the-art representation and retrieval models. The dataset and code are available at https://github.com/yaof20/LER.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"21 1","pages":"4783-4791"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaai.v37i4.25603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Verifying the facts alleged by the prosecutors before the trial requires the judges to retrieve evidence within the massive materials accompanied. Existing Legal AI applications often assume the facts are already determined and fail to notice the difficulty of reconstructing them. To build a practical Legal AI application and free the judges from the manually searching work, we introduce the task of Legal Evidence Retrieval, which aims at automatically retrieving the precise fact-related verbal evidence within a single case. We formulate the task in a dense retrieval paradigm, and jointly learn the constrastive representations and alignments between facts and evidence. To get rid of the tedious annotations, we construct an approximated positive vector for a given fact by aggregating a set of evidence from the same case. An entropy-based denoise technique is further applied to mitigate the impact of false positive samples. We train our models on tens of thousands of unlabeled cases and evaluate them on a labeled dataset containing 919 cases and 4,336 queries. Experimental results indicate that our approach is effective and outperforms other state-of-the-art representation and retrieval models. The dataset and code are available at https://github.com/yaof20/LER.

查看原文本刊更多论文

基于近似聚合正的对比学习的无监督法律证据检索

为了在审判前验证检方提出的事实，法官需要从大量的材料中提取证据。现有的法律人工智能应用程序通常假设事实已经确定，并且没有注意到重建它们的困难。为了构建一个实用的法律人工智能应用程序，将法官从手动搜索工作中解放出来，我们引入了法律证据检索任务，旨在自动检索单个案件中与事实相关的精确口头证据。我们在密集检索范式中制定任务，并共同学习事实和证据之间的约束表示和对齐。为了摆脱繁琐的注释，我们通过聚合来自同一案例的一组证据来为给定事实构造一个近似的正向量。进一步应用基于熵的去噪技术来减轻假阳性样本的影响。我们在成千上万个未标记的案例上训练我们的模型，并在包含919个案例和4336个查询的标记数据集上对它们进行评估。实验结果表明，我们的方法是有效的，并且优于其他最先进的表示和检索模型。数据集和代码可在https://github.com/yaof20/LER上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

自引率

0.00%

发文量