Unsupervised Legal Evidence Retrieval via Contrastive Learning with Approximate Aggregated Positive

Feng Yao, Jingyuan Zhang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Yun Liu, Weixing Shen
{"title":"Unsupervised Legal Evidence Retrieval via Contrastive Learning with Approximate Aggregated Positive","authors":"Feng Yao, Jingyuan Zhang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Yun Liu, Weixing Shen","doi":"10.1609/aaai.v37i4.25603","DOIUrl":null,"url":null,"abstract":"Verifying the facts alleged by the prosecutors before the trial requires the judges to retrieve evidence within the massive materials accompanied.\nExisting Legal AI applications often assume the facts are already determined and fail to notice the difficulty of reconstructing them. To build a practical Legal AI application and free the judges from the manually searching work, we introduce the task of Legal Evidence Retrieval, which aims at automatically retrieving the precise fact-related verbal evidence within a single case. We formulate the task in a dense retrieval paradigm, and jointly learn the constrastive representations and alignments between facts and evidence. To get rid of the tedious annotations, we construct an approximated positive vector for a given fact by aggregating a set of evidence from the same case. An entropy-based denoise technique is further applied to mitigate the impact of false positive samples. We train our models on tens of thousands of unlabeled cases and evaluate them on a labeled dataset containing 919 cases and 4,336 queries. Experimental results indicate that our approach is effective and outperforms other state-of-the-art representation and retrieval models. The dataset and code are available at https://github.com/yaof20/LER.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"21 1","pages":"4783-4791"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaai.v37i4.25603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Verifying the facts alleged by the prosecutors before the trial requires the judges to retrieve evidence within the massive materials accompanied. Existing Legal AI applications often assume the facts are already determined and fail to notice the difficulty of reconstructing them. To build a practical Legal AI application and free the judges from the manually searching work, we introduce the task of Legal Evidence Retrieval, which aims at automatically retrieving the precise fact-related verbal evidence within a single case. We formulate the task in a dense retrieval paradigm, and jointly learn the constrastive representations and alignments between facts and evidence. To get rid of the tedious annotations, we construct an approximated positive vector for a given fact by aggregating a set of evidence from the same case. An entropy-based denoise technique is further applied to mitigate the impact of false positive samples. We train our models on tens of thousands of unlabeled cases and evaluate them on a labeled dataset containing 919 cases and 4,336 queries. Experimental results indicate that our approach is effective and outperforms other state-of-the-art representation and retrieval models. The dataset and code are available at https://github.com/yaof20/LER.
基于近似聚合正的对比学习的无监督法律证据检索
为了在审判前验证检方提出的事实,法官需要从大量的材料中提取证据。现有的法律人工智能应用程序通常假设事实已经确定,并且没有注意到重建它们的困难。为了构建一个实用的法律人工智能应用程序,将法官从手动搜索工作中解放出来,我们引入了法律证据检索任务,旨在自动检索单个案件中与事实相关的精确口头证据。我们在密集检索范式中制定任务,并共同学习事实和证据之间的约束表示和对齐。为了摆脱繁琐的注释,我们通过聚合来自同一案例的一组证据来为给定事实构造一个近似的正向量。进一步应用基于熵的去噪技术来减轻假阳性样本的影响。我们在成千上万个未标记的案例上训练我们的模型,并在包含919个案例和4336个查询的标记数据集上对它们进行评估。实验结果表明,我们的方法是有效的,并且优于其他最先进的表示和检索模型。数据集和代码可在https://github.com/yaof20/LER上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信