Unsupervised Anomaly Detection in Parole Hearings using Language Models

G. Todd, Catalin Voss, Jenny Hong
{"title":"Unsupervised Anomaly Detection in Parole Hearings using Language Models","authors":"G. Todd, Catalin Voss, Jenny Hong","doi":"10.18653/v1/2020.nlpcss-1.8","DOIUrl":null,"url":null,"abstract":"Each year, thousands of roughly 150-page parole hearing transcripts in California go unread because legal experts lack the time to review them. Yet, reviewing transcripts is the only means of public oversight in the parole process. To assist reviewers, we present a simple unsupervised technique for using language models (LMs) to identify procedural anomalies in long-form legal text. Our technique highlights unusual passages that suggest further review could be necessary. We utilize a contrastive perplexity score to identify passages, defined as the scaled difference between its perplexities from two LMs, one fine-tuned on the target (parole) domain, and another pre-trained on out-of-domain text to normalize for grammatical or syntactic anomalies. We present quantitative analysis of the results and note that our method has identified some important cases for review. We are also excited about potential applications in unsupervised anomaly detection, and present a brief analysis of results for detecting fake TripAdvisor reviews.","PeriodicalId":398724,"journal":{"name":"Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.nlpcss-1.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Each year, thousands of roughly 150-page parole hearing transcripts in California go unread because legal experts lack the time to review them. Yet, reviewing transcripts is the only means of public oversight in the parole process. To assist reviewers, we present a simple unsupervised technique for using language models (LMs) to identify procedural anomalies in long-form legal text. Our technique highlights unusual passages that suggest further review could be necessary. We utilize a contrastive perplexity score to identify passages, defined as the scaled difference between its perplexities from two LMs, one fine-tuned on the target (parole) domain, and another pre-trained on out-of-domain text to normalize for grammatical or syntactic anomalies. We present quantitative analysis of the results and note that our method has identified some important cases for review. We are also excited about potential applications in unsupervised anomaly detection, and present a brief analysis of results for detecting fake TripAdvisor reviews.
基于语言模型的假释听证会无监督异常检测
在加州,每年都有数千份大约150页的假释听证会笔录无人阅读,因为法律专家没有时间审查。然而,审查笔录是公众监督假释过程的唯一手段。为了帮助审稿人,我们提出了一种简单的无监督技术,用于使用语言模型(LMs)来识别长格式法律文本中的程序异常。我们的技术会突出那些不寻常的段落,表明可能需要进一步检查。我们利用对比困惑分数来识别段落,定义为来自两个lm的困惑度之间的比例差异,一个在目标(言语)域上进行微调,另一个在域外文本上进行预训练,以规范化语法或句法异常。我们提出了结果的定量分析,并注意到我们的方法已经确定了一些重要的案例进行审查。我们也对在无监督异常检测方面的潜在应用感到兴奋,并对检测TripAdvisor虚假评论的结果进行了简要分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信