Evaluating expertise and sample bias effects for privilege classification in e-discovery

Proceedings of the 15th International Conference on Artificial Intelligence and Law Pub Date : 2015-06-08 DOI:10.1145/2746090.2746101

J. K. Vinjumur

{"title":"Evaluating expertise and sample bias effects for privilege classification in e-discovery","authors":"J. K. Vinjumur","doi":"10.1145/2746090.2746101","DOIUrl":null,"url":null,"abstract":"In civil litigation, documents that are found to be relevant to a production request are usually subjected to an exhaustive manual review for privilege (e.g, for attorney-client privilege, attorney-work product doctrine) in order to be sure that materials that could be withheld is not inadvertently revealed. Usually, the majority of the cost associated in such review process is due to the procedure of having human annotators linearly review documents (for privilege) that the classifier predicts as responsive. This paper investigates the extent to which such privilege judgments obtained by the annotators are useful for training privilege classifiers. The judgments utilized in this paper are derived from the privilege test collection that was created during the 2010 TREC Legal Track. The collection consists of two classes of annotators: \"expert\" judges, who are topic originators called the Topic Authority (TA) and \"non-expert\" judges called assessors. The questions asked in this paper are; (1) Are cheaper, non-expert annotations from assessors sufficient for classifier training? (2) Does the process of selecting special (adjudicated) documents for training affect the classifier results? The paper studies the effect of training classifiers on multiple annotators (with different expertise) and training sets (with and without selection bias). The findings in this paper show that automated privilege classifiers trained on the unbiased set of annotations yield the best results. The usefulness of the biased annotations (from experts and non-experts) for classifier training are comparable.","PeriodicalId":309125,"journal":{"name":"Proceedings of the 15th International Conference on Artificial Intelligence and Law","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Conference on Artificial Intelligence and Law","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2746090.2746101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In civil litigation, documents that are found to be relevant to a production request are usually subjected to an exhaustive manual review for privilege (e.g, for attorney-client privilege, attorney-work product doctrine) in order to be sure that materials that could be withheld is not inadvertently revealed. Usually, the majority of the cost associated in such review process is due to the procedure of having human annotators linearly review documents (for privilege) that the classifier predicts as responsive. This paper investigates the extent to which such privilege judgments obtained by the annotators are useful for training privilege classifiers. The judgments utilized in this paper are derived from the privilege test collection that was created during the 2010 TREC Legal Track. The collection consists of two classes of annotators: "expert" judges, who are topic originators called the Topic Authority (TA) and "non-expert" judges called assessors. The questions asked in this paper are; (1) Are cheaper, non-expert annotations from assessors sufficient for classifier training? (2) Does the process of selecting special (adjudicated) documents for training affect the classifier results? The paper studies the effect of training classifiers on multiple annotators (with different expertise) and training sets (with and without selection bias). The findings in this paper show that automated privilege classifiers trained on the unbiased set of annotations yield the best results. The usefulness of the biased annotations (from experts and non-experts) for classifier training are comparable.

查看原文本刊更多论文

评估专家和样本偏差对电子证据发现中特权分类的影响

在民事诉讼中，被发现与制作请求相关的文件通常要经过详尽的人工审查，以获得保密特权(例如，律师-客户保密特权，律师-工作产品原则)，以确保可以保留的材料不会无意中泄露。通常，与这种审查过程相关的大部分成本是由于让人类注释者线性审查分类器预测为响应的文档(为了特权)的过程。本文研究了由注释者获得的特权判断在多大程度上对训练特权分类器有用。本文中使用的判决源自2010年TREC法律轨道期间创建的特权测试集。该集合由两类注释者组成:“专家”评委，他们是主题发起人，称为主题权威(TA)和“非专家”评委，称为评估员。本文提出的问题有:(1)来自评估器的更便宜的非专家注释是否足以用于分类器训练?(2)选择特殊(裁决)文件进行训练的过程是否影响分类器的结果?本文研究了训练分类器对多个标注器(具有不同专业知识)和训练集(具有和不具有选择偏差)的影响。本文的研究结果表明，在无偏注释集上训练的自动特权分类器产生了最好的结果。有偏见的注释(来自专家和非专家)对分类器训练的有用性是可比的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 15th International Conference on Artificial Intelligence and Law

自引率

0.00%

发文量