Intra- and inter-rater reliability of a manual codification system for footwear impressions: first lessons learned from the development of a footwear database for forensic intelligence purposes

IF 0.2 Q4 MEDICINE, LEGAL
Vincent Mousseau, Maralee Tapps, Romain Volery, Jean Brazeau
{"title":"Intra- and inter-rater reliability of a manual codification system for footwear impressions: first lessons learned from the development of a footwear database for forensic intelligence purposes","authors":"Vincent Mousseau, Maralee Tapps, Romain Volery, Jean Brazeau","doi":"10.1080/00085030.2023.2278911","DOIUrl":null,"url":null,"abstract":"AbstractTo generate forensic intelligence from footwear impressions and link crime scenes, most law enforcement agencies and forensic laboratories rely on a manual codification system based on pattern recognition and classification by human analysts. However, although they are commonly used in practice, to date we still know little about the reliability of such systems. Taking advantage of the development of a footwear database for forensic intelligence purposes at the Laboratoire de sciences judiciaires et de médecine légale in Quebec (Canada), this study aims to make a preliminary assessment of the intra- and inter-rater reliability (i.e., the level of repeatability over time and the level of consensus between analysts) of the proposed codification system. To do so, three forensic intelligence analysts classified a set of 27 crime scene impressions and test impressions at two different times (two weeks apart). Percent agreement, Cohen’s Kappa, and Light’s Kappa were then calculated. Results show that two out of three analysts have reached an almost perfect level of intra-rater agreement, while the other have achieved a substantial level of intra-agreement, and that all analysts have reached a substantial level of inter-rater agreement. Findings suggest that, although a few patterns may have lower levels of agreement, overall, the developed codification system presents a satisfactory level of reliability. This preliminary study thus suggests that contrary to what advocates of fully automated systems may sometimes imply, manual codification of footwear impressions may be fairly appropriate for intelligence purposes. It calls for further evaluative research in the field.RÉSUMÉPour générer du renseignement forensique à partir des traces de chaussure et ainsi insérer la scène de crime unique dans une série criminelle, la plupart des corps policiers ont recours à un système de codification manuelle basé sur la reconnaissance et la classification de certaines formes ou motifs par des analystes formés en la matière. Bien que ces systèmes soient couramment utilisés dans la pratique quotidienne, peu d’études ont jusqu’ici tenté de cerner leur fiabilité. Profitant du développement du service de profilage de traces et d’empreintes de chaussure à des fins de renseignement criminalistique au Laboratoire de sciences judiciaires et de médecine légale au Québec (Canada), la présente étude cherche à réaliser une évaluation préliminaire de la fiabilité intra- et inter-juges (c.-à-d. le niveau de répétabilité dans le temps et le niveau de consensus entre les analystes) du système de codification manuelle élaborée. Pour ce faire, trois membres du service de renseignement criminalistique du Laboratoire ont codifié à deux reprises, à deux semaines d’intervalle, le même ensemble de 27 traces et empreintes de chaussure. Le pourcentage d’accord, le Kappa de Cohen et le Kappa de Light ont ensuite été calculés à partir des données recueillies. Les résultats révèlent que deux analystes sur trois ont atteint un niveau d’accord intra-juge (répétabilité) considéré comme presque parfait, et que tous ont atteint un niveau d’accord inter-juges (consensus) considéré comme substantiel. Les résultats suggèrent également que, bien que quelques motifs et formes présentent des niveaux d’accord plus faibles, dans l’ensemble, le système de codification développé présente un niveau de fiabilité satisfaisant. Cette étude préliminaire suggère donc que, malgré la montée de l’attention dédiée aux systèmes de codification automatisée, la codification manuelle des traces et empreintes de chaussures demeure une méthode pouvant être appropriée pour générer du renseignement forensique et appelle du même coup à des recherches évaluatives supplémentaires dans ce domaine.Keywords: Forensic intelligencereliabilityrepeatabilityreproducibilitydata acquisitionfootwear impressionMots-clés: Renseignement forensiqueFiabilitéRépétabilitéReproductibilitéTraces de chaussureBase de données AcknowledgementsThe authors would like to thank Caroline Mireault from the Chemistry Department of the Laboratoire de sciences judiciaires et de médecine légale for her considerable support in initiating and continuing the implementation of this project in its early years.Disclosure statementNo potential conflict of interest was reported by the authors.Notes1 Our Traduction.2 Austin Hicklin et al. [1, p. 15] wrote: “even if two examiners observe the same features in correspondence/non-correspondence, they may assign different strengths to these observations based upon factors such as their training and experience.”, highlighting the discrepancies that may exist between codifications and comparison conclusions, and consequently, the need to study both independently.3 The authors acknowledge that a sample of 3 participants is quite a small sample, but the three analysts were the only practitioners in the Forensic Intelligence Service at the time of the study who were responsible of footwear impressions analysis, as the other members of the Service were specialized in toxicology and drug intelligence.4 Although they do not perform traditional forensic footwear examination and comparison, they all have the minimal education proposed by the Scientific Working Group for Shoeprint and Tire Tread Evidence (SWGTREAD) to do so [Citation40].5 Cohen’s Kappa and Light’s Kappa cannot be calculated with constant patterns (i.e. “0” for all codified entries). There must be minimal variation to observe when raters do recognize a pattern and when they don’t, and to evaluate intra-rater reliability for a maximum of patterns of the codification system. This explains why there are more metrics computed for the inter-rater reliability test (with 27 traces and prints) than for the intra-rater reliability test (with 20 traces and prints).6 Those seven photographs were selected according to their initial classification in routine work.7 The alphanumeric code for each pattern (see Results and Appendices) therefore reads as follows: letters and numbers before the parenthesis correspond to the pattern, while the letter in parentheses (P, A or T) corresponds to the section of the outsole where the pattern is observed.Additional informationFundingThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.","PeriodicalId":44383,"journal":{"name":"Canadian Society of Forensic Science Journal","volume":null,"pages":null},"PeriodicalIF":0.2000,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Society of Forensic Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/00085030.2023.2278911","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0

Abstract

AbstractTo generate forensic intelligence from footwear impressions and link crime scenes, most law enforcement agencies and forensic laboratories rely on a manual codification system based on pattern recognition and classification by human analysts. However, although they are commonly used in practice, to date we still know little about the reliability of such systems. Taking advantage of the development of a footwear database for forensic intelligence purposes at the Laboratoire de sciences judiciaires et de médecine légale in Quebec (Canada), this study aims to make a preliminary assessment of the intra- and inter-rater reliability (i.e., the level of repeatability over time and the level of consensus between analysts) of the proposed codification system. To do so, three forensic intelligence analysts classified a set of 27 crime scene impressions and test impressions at two different times (two weeks apart). Percent agreement, Cohen’s Kappa, and Light’s Kappa were then calculated. Results show that two out of three analysts have reached an almost perfect level of intra-rater agreement, while the other have achieved a substantial level of intra-agreement, and that all analysts have reached a substantial level of inter-rater agreement. Findings suggest that, although a few patterns may have lower levels of agreement, overall, the developed codification system presents a satisfactory level of reliability. This preliminary study thus suggests that contrary to what advocates of fully automated systems may sometimes imply, manual codification of footwear impressions may be fairly appropriate for intelligence purposes. It calls for further evaluative research in the field.RÉSUMÉPour générer du renseignement forensique à partir des traces de chaussure et ainsi insérer la scène de crime unique dans une série criminelle, la plupart des corps policiers ont recours à un système de codification manuelle basé sur la reconnaissance et la classification de certaines formes ou motifs par des analystes formés en la matière. Bien que ces systèmes soient couramment utilisés dans la pratique quotidienne, peu d’études ont jusqu’ici tenté de cerner leur fiabilité. Profitant du développement du service de profilage de traces et d’empreintes de chaussure à des fins de renseignement criminalistique au Laboratoire de sciences judiciaires et de médecine légale au Québec (Canada), la présente étude cherche à réaliser une évaluation préliminaire de la fiabilité intra- et inter-juges (c.-à-d. le niveau de répétabilité dans le temps et le niveau de consensus entre les analystes) du système de codification manuelle élaborée. Pour ce faire, trois membres du service de renseignement criminalistique du Laboratoire ont codifié à deux reprises, à deux semaines d’intervalle, le même ensemble de 27 traces et empreintes de chaussure. Le pourcentage d’accord, le Kappa de Cohen et le Kappa de Light ont ensuite été calculés à partir des données recueillies. Les résultats révèlent que deux analystes sur trois ont atteint un niveau d’accord intra-juge (répétabilité) considéré comme presque parfait, et que tous ont atteint un niveau d’accord inter-juges (consensus) considéré comme substantiel. Les résultats suggèrent également que, bien que quelques motifs et formes présentent des niveaux d’accord plus faibles, dans l’ensemble, le système de codification développé présente un niveau de fiabilité satisfaisant. Cette étude préliminaire suggère donc que, malgré la montée de l’attention dédiée aux systèmes de codification automatisée, la codification manuelle des traces et empreintes de chaussures demeure une méthode pouvant être appropriée pour générer du renseignement forensique et appelle du même coup à des recherches évaluatives supplémentaires dans ce domaine.Keywords: Forensic intelligencereliabilityrepeatabilityreproducibilitydata acquisitionfootwear impressionMots-clés: Renseignement forensiqueFiabilitéRépétabilitéReproductibilitéTraces de chaussureBase de données AcknowledgementsThe authors would like to thank Caroline Mireault from the Chemistry Department of the Laboratoire de sciences judiciaires et de médecine légale for her considerable support in initiating and continuing the implementation of this project in its early years.Disclosure statementNo potential conflict of interest was reported by the authors.Notes1 Our Traduction.2 Austin Hicklin et al. [1, p. 15] wrote: “even if two examiners observe the same features in correspondence/non-correspondence, they may assign different strengths to these observations based upon factors such as their training and experience.”, highlighting the discrepancies that may exist between codifications and comparison conclusions, and consequently, the need to study both independently.3 The authors acknowledge that a sample of 3 participants is quite a small sample, but the three analysts were the only practitioners in the Forensic Intelligence Service at the time of the study who were responsible of footwear impressions analysis, as the other members of the Service were specialized in toxicology and drug intelligence.4 Although they do not perform traditional forensic footwear examination and comparison, they all have the minimal education proposed by the Scientific Working Group for Shoeprint and Tire Tread Evidence (SWGTREAD) to do so [Citation40].5 Cohen’s Kappa and Light’s Kappa cannot be calculated with constant patterns (i.e. “0” for all codified entries). There must be minimal variation to observe when raters do recognize a pattern and when they don’t, and to evaluate intra-rater reliability for a maximum of patterns of the codification system. This explains why there are more metrics computed for the inter-rater reliability test (with 27 traces and prints) than for the intra-rater reliability test (with 20 traces and prints).6 Those seven photographs were selected according to their initial classification in routine work.7 The alphanumeric code for each pattern (see Results and Appendices) therefore reads as follows: letters and numbers before the parenthesis correspond to the pattern, while the letter in parentheses (P, A or T) corresponds to the section of the outsole where the pattern is observed.Additional informationFundingThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
鞋印手工编纂系统在内部和内部的可靠性:从为法医情报目的而发展鞋类数据库所吸取的初步教训
3 .作者承认,3名参与者的样本是一个相当小的样本,但在进行研究时,这3名分析人员是法医情报处唯一负责鞋印分析的从业人员,因为法医情报处的其他成员专门从事毒理学和毒品情报虽然他们不进行传统的法医鞋类检查和比较,但他们都接受了鞋印和轮胎胎面证据科学工作组(SWGTREAD)建议的最低限度的教育[引文40]Cohen’s Kappa和Light’s Kappa不能用恒定的模式(即所有编码条目的“0”)来计算。必须有最小的变化,以观察评级员何时识别模式,何时不识别模式,并评估评级员内部的可靠性,以最大限度地提高编纂系统的模式。这就解释了为什么为内部可靠性测试(有20个跟踪和打印)计算了比内部可靠性测试(有27个跟踪和打印)更多的度量这七张照片是根据它们在日常工作中的初始分类选出的因此,每个图案的字母数字代码(参见结果和附录)如下:括号前的字母和数字对应于图案,而括号内的字母(P, A或T)对应于观察到图案的外底部分。本研究没有从公共、商业或非营利部门的资助机构获得任何特定的资助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.00
自引率
0.00%
发文量
21
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信