Interobserver variability of recall decisions between mammography readers in the English NHS breast screening programme: A comparison of interobserver variability measures

IF 3.3 3区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
European Journal of Radiology Pub Date : 2026-04-01 Epub Date: 2026-02-07 DOI:10.1016/j.ejrad.2026.112723
Laura Quinn , David Jenkinson , Sian Taylor-Phillips , Yemisi Takwoingi , Alice Sitch
{"title":"Interobserver variability of recall decisions between mammography readers in the English NHS breast screening programme: A comparison of interobserver variability measures","authors":"Laura Quinn ,&nbsp;David Jenkinson ,&nbsp;Sian Taylor-Phillips ,&nbsp;Yemisi Takwoingi ,&nbsp;Alice Sitch","doi":"10.1016/j.ejrad.2026.112723","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>To evaluate interobserver variability between mammogram readers’ recall decisions in the English NHS breast screening programme, comparing different variability measures.</div></div><div><h3>Methods</h3><div>Data from 401,682 women in 22 NHS centres who underwent mammographic screening interpreted independently by two mammogram readers were included. Percentage agreement, prevalence-adjusted bias-adjusted-kappa (PABAK), Gwet’s agreement coefficient (Gwet’s AC) and Cohen’s kappa were reported with 95% confidence intervals. Analyses were performed separately for women at first and subsequent screening appointments, by cancer diagnosis, reader recall rates and age group.</div></div><div><h3>Results</h3><div>Of 86,287 women at first screening, 6,491 (7.5%) were recalled, compared to 9,488 (3.0%) of 315,395 at subsequent screenings. Percentage agreement, Gwet’s AC, and PABAK were lower for first screening than subsequent (93.6%, 95%CI: 93.4–93.7 vs 97.2%, 95%CI: 97.2–97.3), (92.3, 95%CI:92.1 to 92.5 vs 97.0, 95% CI: 97.0 to 97.1) and (87.2, 95%CI: 86.9–87.4 vs 94.4, 95%CI: 94.3–94.5), whereas Cohen’s kappa, which is biased downwards when prevalence of recall is lower, did not change (61.6, 95%CI: 60.7–62.5 vs 61.8, 95%CI: 61.0–62.5). Percentage agreement, Gwet’s AC, and PABAK were lower for women with cancer detected than without, but Cohen’s kappa showed the opposite pattern, driven by prevalence bias. Percentage agreement, Gwet’s AC, and PABAK were lower when one/both readers had high recall rates, but Cohen’s kappa showed no important pattern.</div></div><div><h3>Conclusions</h3><div>Percentage agreement, Gwet’s AC, and PABAK showed lower agreement for interpreting the more challenging first screen, without assistance of previous mammograms, when women had cancer and when one/both readers had a high recall rate. Cohen’s kappa was heavily distorted by outcome prevalence. Despite widespread use, Cohen’s kappa is inappropriate for low prevalence settings such as screening, or making comparisons when prevalence varies.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"197 ","pages":"Article 112723"},"PeriodicalIF":3.3000,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Radiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0720048X26000719","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives

To evaluate interobserver variability between mammogram readers’ recall decisions in the English NHS breast screening programme, comparing different variability measures.

Methods

Data from 401,682 women in 22 NHS centres who underwent mammographic screening interpreted independently by two mammogram readers were included. Percentage agreement, prevalence-adjusted bias-adjusted-kappa (PABAK), Gwet’s agreement coefficient (Gwet’s AC) and Cohen’s kappa were reported with 95% confidence intervals. Analyses were performed separately for women at first and subsequent screening appointments, by cancer diagnosis, reader recall rates and age group.

Results

Of 86,287 women at first screening, 6,491 (7.5%) were recalled, compared to 9,488 (3.0%) of 315,395 at subsequent screenings. Percentage agreement, Gwet’s AC, and PABAK were lower for first screening than subsequent (93.6%, 95%CI: 93.4–93.7 vs 97.2%, 95%CI: 97.2–97.3), (92.3, 95%CI:92.1 to 92.5 vs 97.0, 95% CI: 97.0 to 97.1) and (87.2, 95%CI: 86.9–87.4 vs 94.4, 95%CI: 94.3–94.5), whereas Cohen’s kappa, which is biased downwards when prevalence of recall is lower, did not change (61.6, 95%CI: 60.7–62.5 vs 61.8, 95%CI: 61.0–62.5). Percentage agreement, Gwet’s AC, and PABAK were lower for women with cancer detected than without, but Cohen’s kappa showed the opposite pattern, driven by prevalence bias. Percentage agreement, Gwet’s AC, and PABAK were lower when one/both readers had high recall rates, but Cohen’s kappa showed no important pattern.

Conclusions

Percentage agreement, Gwet’s AC, and PABAK showed lower agreement for interpreting the more challenging first screen, without assistance of previous mammograms, when women had cancer and when one/both readers had a high recall rate. Cohen’s kappa was heavily distorted by outcome prevalence. Despite widespread use, Cohen’s kappa is inappropriate for low prevalence settings such as screening, or making comparisons when prevalence varies.
在英国NHS乳腺筛查项目中,乳房x光检查阅读者之间回忆决定的观察者间可变性:观察者间可变性测量的比较
目的评价英国NHS乳腺筛查项目中乳房x线照片阅读者回忆决定的观察者间可变性,比较不同的可变性措施。方法纳入来自22个NHS中心的401682名接受乳房x光检查的妇女的数据,这些妇女由两名乳房x光检查阅读器独立解读。报告一致性百分比、流行校正偏倚校正kappa (PABAK)、Gwet一致系数(Gwet’s AC)和Cohen’s kappa,置信区间为95%。根据癌症诊断、读者回忆率和年龄组,分别对首次和随后的筛查预约的女性进行了分析。结果在首次筛查的86287名女性中,6491名(7.5%)被召回,而在随后的筛查中,315395名女性中有9488名(3.0%)被召回。首次筛查时,一致性百分比、Gwet的AC和PABAK低于后续筛查(93.6%,95%CI: 93.4-93.7 vs 97.2%, 95%CI: 97.2-97.3)、(92.3,95%CI:92.1 - 92.5 vs 97.0, 95%CI: 97.0 - 97.1)和(87.2,95%CI: 86.9-87.4 vs 94.4, 95%CI: 94.3-94.5),而当回忆率较低时,Cohen的kappa没有变化(61.6,95%CI: 60.7-62.5 vs 61.8, 95%CI: 61.0-62.5)。百分比一致,Gwet的AC和PABAK在检测到癌症的女性中低于未检测到癌症的女性,但Cohen的kappa显示出相反的模式,受流行偏差的驱动。当一个/两个读者的回忆率较高时,一致性百分比、Gwet’s AC和PABAK较低,但Cohen’s kappa没有显示出重要的模式。Gwet’s AC和PABAK的百分比一致性显示,当女性患有癌症,以及当其中一个/两个阅读者的回忆率很高时,在没有以前乳房x线照片的帮助下,对更具挑战性的第一次筛查的解释一致性较低。科恩kappa被结果普遍程度严重扭曲。尽管广泛使用,Cohen的kappa并不适用于低患病率环境,如筛查,或在患病率不同时进行比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.70
自引率
3.00%
发文量
398
审稿时长
42 days
期刊介绍: European Journal of Radiology is an international journal which aims to communicate to its readers, state-of-the-art information on imaging developments in the form of high quality original research articles and timely reviews on current developments in the field. Its audience includes clinicians at all levels of training including radiology trainees, newly qualified imaging specialists and the experienced radiologist. Its aim is to inform efficient, appropriate and evidence-based imaging practice to the benefit of patients worldwide.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书