Interobserver variability of recall decisions between mammography readers in the English NHS breast screening programme: A comparison of interobserver variability measures

IF 3.3 3区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

European Journal of Radiology Pub Date : 2026-04-01 Epub Date: 2026-02-07 DOI:10.1016/j.ejrad.2026.112723

Laura Quinn , David Jenkinson , Sian Taylor-Phillips , Yemisi Takwoingi , Alice Sitch

{"title":"Interobserver variability of recall decisions between mammography readers in the English NHS breast screening programme: A comparison of interobserver variability measures","authors":"Laura Quinn , David Jenkinson , Sian Taylor-Phillips , Yemisi Takwoingi , Alice Sitch","doi":"10.1016/j.ejrad.2026.112723","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>To evaluate interobserver variability between mammogram readers’ recall decisions in the English NHS breast screening programme, comparing different variability measures.</div></div><div><h3>Methods</h3><div>Data from 401,682 women in 22 NHS centres who underwent mammographic screening interpreted independently by two mammogram readers were included. Percentage agreement, prevalence-adjusted bias-adjusted-kappa (PABAK), Gwet’s agreement coefficient (Gwet’s AC) and Cohen’s kappa were reported with 95% confidence intervals. Analyses were performed separately for women at first and subsequent screening appointments, by cancer diagnosis, reader recall rates and age group.</div></div><div><h3>Results</h3><div>Of 86,287 women at first screening, 6,491 (7.5%) were recalled, compared to 9,488 (3.0%) of 315,395 at subsequent screenings. Percentage agreement, Gwet’s AC, and PABAK were lower for first screening than subsequent (93.6%, 95%CI: 93.4–93.7 vs 97.2%, 95%CI: 97.2–97.3), (92.3, 95%CI:92.1 to 92.5 vs 97.0, 95% CI: 97.0 to 97.1) and (87.2, 95%CI: 86.9–87.4 vs 94.4, 95%CI: 94.3–94.5), whereas Cohen’s kappa, which is biased downwards when prevalence of recall is lower, did not change (61.6, 95%CI: 60.7–62.5 vs 61.8, 95%CI: 61.0–62.5). Percentage agreement, Gwet’s AC, and PABAK were lower for women with cancer detected than without, but Cohen’s kappa showed the opposite pattern, driven by prevalence bias. Percentage agreement, Gwet’s AC, and PABAK were lower when one/both readers had high recall rates, but Cohen’s kappa showed no important pattern.</div></div><div><h3>Conclusions</h3><div>Percentage agreement, Gwet’s AC, and PABAK showed lower agreement for interpreting the more challenging first screen, without assistance of previous mammograms, when women had cancer and when one/both readers had a high recall rate. Cohen’s kappa was heavily distorted by outcome prevalence. Despite widespread use, Cohen’s kappa is inappropriate for low prevalence settings such as screening, or making comparisons when prevalence varies.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"197 ","pages":"Article 112723"},"PeriodicalIF":3.3000,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Radiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0720048X26000719","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives

To evaluate interobserver variability between mammogram readers’ recall decisions in the English NHS breast screening programme, comparing different variability measures.

Methods

Data from 401,682 women in 22 NHS centres who underwent mammographic screening interpreted independently by two mammogram readers were included. Percentage agreement, prevalence-adjusted bias-adjusted-kappa (PABAK), Gwet’s agreement coefficient (Gwet’s AC) and Cohen’s kappa were reported with 95% confidence intervals. Analyses were performed separately for women at first and subsequent screening appointments, by cancer diagnosis, reader recall rates and age group.

Results

Of 86,287 women at first screening, 6,491 (7.5%) were recalled, compared to 9,488 (3.0%) of 315,395 at subsequent screenings. Percentage agreement, Gwet’s AC, and PABAK were lower for first screening than subsequent (93.6%, 95%CI: 93.4–93.7 vs 97.2%, 95%CI: 97.2–97.3), (92.3, 95%CI:92.1 to 92.5 vs 97.0, 95% CI: 97.0 to 97.1) and (87.2, 95%CI: 86.9–87.4 vs 94.4, 95%CI: 94.3–94.5), whereas Cohen’s kappa, which is biased downwards when prevalence of recall is lower, did not change (61.6, 95%CI: 60.7–62.5 vs 61.8, 95%CI: 61.0–62.5). Percentage agreement, Gwet’s AC, and PABAK were lower for women with cancer detected than without, but Cohen’s kappa showed the opposite pattern, driven by prevalence bias. Percentage agreement, Gwet’s AC, and PABAK were lower when one/both readers had high recall rates, but Cohen’s kappa showed no important pattern.

Conclusions

Percentage agreement, Gwet’s AC, and PABAK showed lower agreement for interpreting the more challenging first screen, without assistance of previous mammograms, when women had cancer and when one/both readers had a high recall rate. Cohen’s kappa was heavily distorted by outcome prevalence. Despite widespread use, Cohen’s kappa is inappropriate for low prevalence settings such as screening, or making comparisons when prevalence varies.

查看原文本刊更多论文

在英国NHS乳腺筛查项目中，乳房x光检查阅读者之间回忆决定的观察者间可变性：观察者间可变性测量的比较

目的评价英国NHS乳腺筛查项目中乳房x线照片阅读者回忆决定的观察者间可变性，比较不同的可变性措施。方法纳入来自22个NHS中心的401682名接受乳房x光检查的妇女的数据，这些妇女由两名乳房x光检查阅读器独立解读。报告一致性百分比、流行校正偏倚校正kappa （PABAK）、Gwet一致系数（Gwet’s AC）和Cohen’s kappa，置信区间为95%。根据癌症诊断、读者回忆率和年龄组，分别对首次和随后的筛查预约的女性进行了分析。结果在首次筛查的86287名女性中，6491名（7.5%）被召回，而在随后的筛查中，315395名女性中有9488名（3.0%）被召回。首次筛查时，一致性百分比、Gwet的AC和PABAK低于后续筛查（93.6%,95%CI: 93.4-93.7 vs 97.2%, 95%CI: 97.2-97.3）、（92.3,95%CI:92.1 - 92.5 vs 97.0, 95%CI: 97.0 - 97.1）和（87.2,95%CI: 86.9-87.4 vs 94.4, 95%CI: 94.3-94.5），而当回忆率较低时，Cohen的kappa没有变化（61.6,95%CI: 60.7-62.5 vs 61.8, 95%CI: 61.0-62.5）。百分比一致，Gwet的AC和PABAK在检测到癌症的女性中低于未检测到癌症的女性，但Cohen的kappa显示出相反的模式，受流行偏差的驱动。当一个/两个读者的回忆率较高时，一致性百分比、Gwet’s AC和PABAK较低，但Cohen’s kappa没有显示出重要的模式。Gwet’s AC和PABAK的百分比一致性显示，当女性患有癌症，以及当其中一个/两个阅读者的回忆率很高时，在没有以前乳房x线照片的帮助下，对更具挑战性的第一次筛查的解释一致性较低。科恩kappa被结果普遍程度严重扭曲。尽管广泛使用，Cohen的kappa并不适用于低患病率环境，如筛查，或在患病率不同时进行比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Radiology 医学-核医学

CiteScore

6.70

自引率

3.00%

发文量

398

审稿时长

42 days

期刊介绍： European Journal of Radiology is an international journal which aims to communicate to its readers, state-of-the-art information on imaging developments in the form of high quality original research articles and timely reviews on current developments in the field. Its audience includes clinicians at all levels of training including radiology trainees, newly qualified imaging specialists and the experienced radiologist. Its aim is to inform efficient, appropriate and evidence-based imaging practice to the benefit of patients worldwide.