{"title":"Evaluating expertise and sample bias effects for privilege classification in e-discovery","authors":"J. K. Vinjumur","doi":"10.1145/2746090.2746101","DOIUrl":null,"url":null,"abstract":"In civil litigation, documents that are found to be relevant to a production request are usually subjected to an exhaustive manual review for privilege (e.g, for attorney-client privilege, attorney-work product doctrine) in order to be sure that materials that could be withheld is not inadvertently revealed. Usually, the majority of the cost associated in such review process is due to the procedure of having human annotators linearly review documents (for privilege) that the classifier predicts as responsive. This paper investigates the extent to which such privilege judgments obtained by the annotators are useful for training privilege classifiers. The judgments utilized in this paper are derived from the privilege test collection that was created during the 2010 TREC Legal Track. The collection consists of two classes of annotators: \"expert\" judges, who are topic originators called the Topic Authority (TA) and \"non-expert\" judges called assessors. The questions asked in this paper are; (1) Are cheaper, non-expert annotations from assessors sufficient for classifier training? (2) Does the process of selecting special (adjudicated) documents for training affect the classifier results? The paper studies the effect of training classifiers on multiple annotators (with different expertise) and training sets (with and without selection bias). The findings in this paper show that automated privilege classifiers trained on the unbiased set of annotations yield the best results. The usefulness of the biased annotations (from experts and non-experts) for classifier training are comparable.","PeriodicalId":309125,"journal":{"name":"Proceedings of the 15th International Conference on Artificial Intelligence and Law","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Conference on Artificial Intelligence and Law","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2746090.2746101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In civil litigation, documents that are found to be relevant to a production request are usually subjected to an exhaustive manual review for privilege (e.g, for attorney-client privilege, attorney-work product doctrine) in order to be sure that materials that could be withheld is not inadvertently revealed. Usually, the majority of the cost associated in such review process is due to the procedure of having human annotators linearly review documents (for privilege) that the classifier predicts as responsive. This paper investigates the extent to which such privilege judgments obtained by the annotators are useful for training privilege classifiers. The judgments utilized in this paper are derived from the privilege test collection that was created during the 2010 TREC Legal Track. The collection consists of two classes of annotators: "expert" judges, who are topic originators called the Topic Authority (TA) and "non-expert" judges called assessors. The questions asked in this paper are; (1) Are cheaper, non-expert annotations from assessors sufficient for classifier training? (2) Does the process of selecting special (adjudicated) documents for training affect the classifier results? The paper studies the effect of training classifiers on multiple annotators (with different expertise) and training sets (with and without selection bias). The findings in this paper show that automated privilege classifiers trained on the unbiased set of annotations yield the best results. The usefulness of the biased annotations (from experts and non-experts) for classifier training are comparable.