Leandi Liebenberg PhD, Kyra E. Stull PhD, Ericka N. L'Abbé PhD
{"title":"Patterns of observer error in scoring macromorphoscopic traits for population affinity","authors":"Leandi Liebenberg PhD, Kyra E. Stull PhD, Ericka N. L'Abbé PhD","doi":"10.1111/1556-4029.70063","DOIUrl":null,"url":null,"abstract":"<p>Revising methodologies is essential to understand the limitations and biases inherent in certain methods, which is crucial for obtaining reliable results. Due to the subjective nature of non-metric methods, variation in trait scoring and its impact on accurately classifying biological parameters remains a concern that requires further investigation. This study aimed to examine the effects of observer experience, familiarity with the method, and different statistical approaches on the repeatability of macromorphoscopic traits in the cranium for population affinity. Seventeen traits were scored on a sample of 10 crania by five observers with varying experience levels. Intra-observer agreement ranged from moderate to perfect, with three traits—inferior nasal margin, nasal bone shape, and nasal overgrowth demonstrating—the lowest agreement. Overall, inter-observer repeatability ranged from poor to substantial agreement. After a group discussion on the scoring procedure and subsequent rescoring of the crania, a slight improvement in agreement was observed, with kappa values shifting towards moderate and substantial levels. Each observer exhibited variation in the repeatability of different traits. While general experience did not consistently translate into proficiency with the method, familiarity with the specific traits and scoring procedures contributed to more consistent results. Therefore, method-specific training is crucial before applying the MMS traits in practice. Additionally, the choice of statistical approaches—such as applying different weights to Cohen's kappa based on data type—can influence the perceived reliability of a method. Practitioners should select weights and tests that are most appropriate for the data type of each trait being analyzed.</p>","PeriodicalId":15743,"journal":{"name":"Journal of forensic sciences","volume":"70 4","pages":"1489-1500"},"PeriodicalIF":1.8000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1556-4029.70063","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of forensic sciences","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1556-4029.70063","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0
Abstract
Revising methodologies is essential to understand the limitations and biases inherent in certain methods, which is crucial for obtaining reliable results. Due to the subjective nature of non-metric methods, variation in trait scoring and its impact on accurately classifying biological parameters remains a concern that requires further investigation. This study aimed to examine the effects of observer experience, familiarity with the method, and different statistical approaches on the repeatability of macromorphoscopic traits in the cranium for population affinity. Seventeen traits were scored on a sample of 10 crania by five observers with varying experience levels. Intra-observer agreement ranged from moderate to perfect, with three traits—inferior nasal margin, nasal bone shape, and nasal overgrowth demonstrating—the lowest agreement. Overall, inter-observer repeatability ranged from poor to substantial agreement. After a group discussion on the scoring procedure and subsequent rescoring of the crania, a slight improvement in agreement was observed, with kappa values shifting towards moderate and substantial levels. Each observer exhibited variation in the repeatability of different traits. While general experience did not consistently translate into proficiency with the method, familiarity with the specific traits and scoring procedures contributed to more consistent results. Therefore, method-specific training is crucial before applying the MMS traits in practice. Additionally, the choice of statistical approaches—such as applying different weights to Cohen's kappa based on data type—can influence the perceived reliability of a method. Practitioners should select weights and tests that are most appropriate for the data type of each trait being analyzed.
期刊介绍:
The Journal of Forensic Sciences (JFS) is the official publication of the American Academy of Forensic Sciences (AAFS). It is devoted to the publication of original investigations, observations, scholarly inquiries and reviews in various branches of the forensic sciences. These include anthropology, criminalistics, digital and multimedia sciences, engineering and applied sciences, pathology/biology, psychiatry and behavioral science, jurisprudence, odontology, questioned documents, and toxicology. Similar submissions dealing with forensic aspects of other sciences and the social sciences are also accepted, as are submissions dealing with scientifically sound emerging science disciplines. The content and/or views expressed in the JFS are not necessarily those of the AAFS, the JFS Editorial Board, the organizations with which authors are affiliated, or the publisher of JFS. All manuscript submissions are double-blind peer-reviewed.