Handwriting identification using random forests and score‐based likelihood ratios

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-12-03 DOI:10.1002/sam.11566

M. Q. Johnson, Danica M. Ommen

{"title":"Handwriting identification using random forests and score‐based likelihood ratios","authors":"M. Q. Johnson, Danica M. Ommen","doi":"10.1002/sam.11566","DOIUrl":null,"url":null,"abstract":"Handwriting analysis is conducted by forensic document examiners who are able to visually recognize characteristics of writing to evaluate the evidence of writership. Recently, there have been incentives to investigate how to quantify the similarity between two written documents to support the conclusions drawn by experts. We use an automatic algorithm within the “handwriter” package in R, to decompose a handwritten sample into small graphical units of writing. These graphs are sorted into 40 exemplar groups or clusters. We hypothesize that the frequency with which a person contributes graphs to each cluster is characteristic of their handwriting. Given two questioned handwritten documents, we can then use the vectors of cluster frequencies to quantify the similarity between the two documents. We extract features from the difference between the vectors and combine them using a random forest. The output from the random forest is used as the similarity score to compare documents. We estimate the distributions of the similarity scores computed from multiple pairs of documents known to have been written by the same and by different persons, and use these estimated densities to obtain score‐based likelihood ratios (SLRs) that rely on different assumptions. We find that the SLRs are able to indicate whether the similarity observed between two documents is more or less likely depending on writership.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11566","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Handwriting analysis is conducted by forensic document examiners who are able to visually recognize characteristics of writing to evaluate the evidence of writership. Recently, there have been incentives to investigate how to quantify the similarity between two written documents to support the conclusions drawn by experts. We use an automatic algorithm within the “handwriter” package in R, to decompose a handwritten sample into small graphical units of writing. These graphs are sorted into 40 exemplar groups or clusters. We hypothesize that the frequency with which a person contributes graphs to each cluster is characteristic of their handwriting. Given two questioned handwritten documents, we can then use the vectors of cluster frequencies to quantify the similarity between the two documents. We extract features from the difference between the vectors and combine them using a random forest. The output from the random forest is used as the similarity score to compare documents. We estimate the distributions of the similarity scores computed from multiple pairs of documents known to have been written by the same and by different persons, and use these estimated densities to obtain score‐based likelihood ratios (SLRs) that rely on different assumptions. We find that the SLRs are able to indicate whether the similarity observed between two documents is more or less likely depending on writership.

查看原文本刊更多论文

使用随机森林和基于分数的似然比的笔迹识别

笔迹分析是由法医文件审查员进行的，他们能够从视觉上识别笔迹的特征，以评估笔迹的证据。最近，人们开始研究如何量化两份书面文件之间的相似性，以支持专家得出的结论。我们使用R中的“handwriter”包中的自动算法，将手写样本分解为小的图形书写单元。这些图表被分为40个范例组或集群。我们假设，一个人在每个集群中贡献图形的频率是他们笔迹的特征。给定两个被质疑的手写文档，然后我们可以使用聚类频率向量来量化两个文档之间的相似性。我们从向量之间的差异中提取特征，并使用随机森林将它们组合起来。随机森林的输出用作比较文档的相似度评分。我们估计了已知由同一人和不同人撰写的多对文档计算出的相似分数的分布，并使用这些估计密度来获得依赖于不同假设的基于分数的似然比(slr)。我们发现单反能够表明两个文档之间观察到的相似性是否或多或少取决于写作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Analysis and Data Mining: The ASA Data Science Journal

自引率

0.00%

发文量