建立放射医师评估模型，探索优化乳腺 X 光筛查双读的配对策略。

IF 3.1 3区医学 Q2 HEALTH CARE SCIENCES & SERVICES

Medical Decision Making Pub Date : 2024-10-01 Epub Date: 2024-07-30 DOI:10.1177/0272989X241264572

Jessie J J Gommers, Craig K Abbey, Fredrik Strand, Sian Taylor-Phillips, David J Jenkinson, Marthe Larsen, Solveig Hofvind, Mireille J M Broeders, Ioannis Sechopoulos

{"title":"建立放射医师评估模型，探索优化乳腺 X 光筛查双读的配对策略。","authors":"Jessie J J Gommers, Craig K Abbey, Fredrik Strand, Sian Taylor-Phillips, David J Jenkinson, Marthe Larsen, Solveig Hofvind, Mireille J M Broeders, Ioannis Sechopoulos","doi":"10.1177/0272989X241264572","DOIUrl":null,"url":null,"abstract":"Purpose: To develop a model that simulates radiologist assessments and use it to explore whether pairing readers based on their individual performance characteristics could optimize screening performance.Methods: Logistic regression models were designed and used to model individual radiologist assessments. For model evaluation, model-predicted individual performance metrics and paired disagreement rates were compared against the observed data using Pearson correlation coefficients. The logistic regression models were subsequently used to simulate different screening programs with reader pairing based on individual true-positive rates (TPR) and/or false-positive rates (FPR). For this, retrospective results from breast cancer screening programs employing double reading in Sweden, England, and Norway were used. Outcomes of random pairing were compared against those composed of readers with similar and opposite TPRs/FPRs, with positive assessments defined by either reader flagging an examination as abnormal.Results: The analysis data sets consisted of 936,621 (Sweden), 435,281 (England), and 1,820,053 (Norway) examinations. There was good agreement between the model-predicted and observed radiologists' TPR and FPR (r ≥ 0.969). Model-predicted negative-case disagreement rates showed high correlations (r ≥ 0.709), whereas positive-case disagreement rates had lower correlation levels due to sparse data (r ≥ 0.532). Pairing radiologists with similar FPR characteristics (Sweden: 4.50% [95% confidence interval: 4.46%-4.54%], England: 5.51% [5.47%-5.56%], Norway: 8.03% [7.99%-8.07%]) resulted in significantly lower FPR than with random pairing (Sweden: 4.74% [4.70%-4.78%], England: 5.76% [5.71%-5.80%], Norway: 8.30% [8.26%-8.34%]), reducing examinations sent to consensus/arbitration while the TPR did not change significantly. Other pairing strategies resulted in equal or worse performance than random pairing.Conclusions: Logistic regression models accurately predicted screening mammography assessments and helped explore different radiologist pairing strategies. Pairing readers with similar modeled FPR characteristics reduced the number of examinations unnecessarily sent to consensus/arbitration without significantly compromising the TPR.Highlights: A logistic-regression model can be derived that accurately predicts individual and paired reader performance during mammography screening reading.Pairing screening mammography radiologists with similar false-positive characteristics reduced false-positive rates with no significant loss in true positives and may reduce the number of examinations unnecessarily sent to consensus/arbitration.","PeriodicalId":49839,"journal":{"name":"Medical Decision Making","volume":" ","pages":"828-842"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11490068/pdf/","citationCount":"0","resultStr":"{\"title\":\"Modeling Radiologists' Assessments to Explore Pairing Strategies for Optimized Double Reading of Screening Mammograms.\",\"authors\":\"Jessie J J Gommers, Craig K Abbey, Fredrik Strand, Sian Taylor-Phillips, David J Jenkinson, Marthe Larsen, Solveig Hofvind, Mireille J M Broeders, Ioannis Sechopoulos\",\"doi\":\"10.1177/0272989X241264572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: To develop a model that simulates radiologist assessments and use it to explore whether pairing readers based on their individual performance characteristics could optimize screening performance.Methods: Logistic regression models were designed and used to model individual radiologist assessments. For model evaluation, model-predicted individual performance metrics and paired disagreement rates were compared against the observed data using Pearson correlation coefficients. The logistic regression models were subsequently used to simulate different screening programs with reader pairing based on individual true-positive rates (TPR) and/or false-positive rates (FPR). For this, retrospective results from breast cancer screening programs employing double reading in Sweden, England, and Norway were used. Outcomes of random pairing were compared against those composed of readers with similar and opposite TPRs/FPRs, with positive assessments defined by either reader flagging an examination as abnormal.Results: The analysis data sets consisted of 936,621 (Sweden), 435,281 (England), and 1,820,053 (Norway) examinations. There was good agreement between the model-predicted and observed radiologists' TPR and FPR (r ≥ 0.969). Model-predicted negative-case disagreement rates showed high correlations (r ≥ 0.709), whereas positive-case disagreement rates had lower correlation levels due to sparse data (r ≥ 0.532). Pairing radiologists with similar FPR characteristics (Sweden: 4.50% [95% confidence interval: 4.46%-4.54%], England: 5.51% [5.47%-5.56%], Norway: 8.03% [7.99%-8.07%]) resulted in significantly lower FPR than with random pairing (Sweden: 4.74% [4.70%-4.78%], England: 5.76% [5.71%-5.80%], Norway: 8.30% [8.26%-8.34%]), reducing examinations sent to consensus/arbitration while the TPR did not change significantly. Other pairing strategies resulted in equal or worse performance than random pairing.Conclusions: Logistic regression models accurately predicted screening mammography assessments and helped explore different radiologist pairing strategies. Pairing readers with similar modeled FPR characteristics reduced the number of examinations unnecessarily sent to consensus/arbitration without significantly compromising the TPR.Highlights: A logistic-regression model can be derived that accurately predicts individual and paired reader performance during mammography screening reading.Pairing screening mammography radiologists with similar false-positive characteristics reduced false-positive rates with no significant loss in true positives and may reduce the number of examinations unnecessarily sent to consensus/arbitration.\",\"PeriodicalId\":49839,\"journal\":{\"name\":\"Medical Decision Making\",\"volume\":\" \",\"pages\":\"828-842\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11490068/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/0272989X241264572\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/7/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/0272989X241264572","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

目的：建立一个模拟放射科医生评估的模型，并利用该模型探讨根据放射科医生的个人表现特征将读者配对是否能优化筛查效果：方法：设计并使用逻辑回归模型来模拟放射科医生的个人评估。为了对模型进行评估，使用皮尔逊相关系数将模型预测的个人绩效指标和配对分歧率与观察到的数据进行比较。逻辑回归模型随后被用于模拟不同的筛查项目，根据个体的真阳性率（TPR）和/或假阳性率（FPR）进行读者配对。为此，我们使用了瑞典、英国和挪威采用双读数的乳腺癌筛查项目的回顾性结果。将随机配对的结果与具有相似和相反TPRs/FPRs的读数组成的结果进行了比较，阳性评估的定义是任何一位读数将检查标记为异常：分析数据集包括 936,621 次（瑞典）、435,281 次（英国）和 1,820,053 次（挪威）检查。模型预测的放射科医生 TPR 和 FPR 与观察结果之间的一致性很好（r ≥ 0.969）。模型预测的阴性病例分歧率显示出较高的相关性（r ≥ 0.709），而阳性病例分歧率由于数据稀少，相关性较低（r ≥ 0.532）。与随机配对（瑞典：4.74% [4.70%-4.78%]，英格兰：5.76% [5.71%-5.80%]，挪威：8.30% [8.26%]）相比，8.03% [7.99%-8.07%]）的 FPR 显著较低：挪威：8.30% [8.26%-8.34%]），减少了送去协商一致/仲裁的考试，而总审查时间没有显著变化。其他配对策略的结果与随机配对的结果相同或更差：逻辑回归模型准确预测了乳腺X光筛查的评估结果，有助于探索不同的放射医师配对策略。将具有类似模型 FPR 特征的读者配对在不明显影响 TPR 的情况下减少了不必要地送交共识/仲裁的检查次数：将具有相似假阳性特征的乳腺 X 射线摄影筛查放射医师配对可降低假阳性率，而真阳性率并无明显下降，并可减少不必要地送交共识/仲裁的检查次数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Modeling Radiologists' Assessments to Explore Pairing Strategies for Optimized Double Reading of Screening Mammograms.

Purpose: To develop a model that simulates radiologist assessments and use it to explore whether pairing readers based on their individual performance characteristics could optimize screening performance.

Methods: Logistic regression models were designed and used to model individual radiologist assessments. For model evaluation, model-predicted individual performance metrics and paired disagreement rates were compared against the observed data using Pearson correlation coefficients. The logistic regression models were subsequently used to simulate different screening programs with reader pairing based on individual true-positive rates (TPR) and/or false-positive rates (FPR). For this, retrospective results from breast cancer screening programs employing double reading in Sweden, England, and Norway were used. Outcomes of random pairing were compared against those composed of readers with similar and opposite TPRs/FPRs, with positive assessments defined by either reader flagging an examination as abnormal.

Results: The analysis data sets consisted of 936,621 (Sweden), 435,281 (England), and 1,820,053 (Norway) examinations. There was good agreement between the model-predicted and observed radiologists' TPR and FPR (r ≥ 0.969). Model-predicted negative-case disagreement rates showed high correlations (r ≥ 0.709), whereas positive-case disagreement rates had lower correlation levels due to sparse data (r ≥ 0.532). Pairing radiologists with similar FPR characteristics (Sweden: 4.50% [95% confidence interval: 4.46%-4.54%], England: 5.51% [5.47%-5.56%], Norway: 8.03% [7.99%-8.07%]) resulted in significantly lower FPR than with random pairing (Sweden: 4.74% [4.70%-4.78%], England: 5.76% [5.71%-5.80%], Norway: 8.30% [8.26%-8.34%]), reducing examinations sent to consensus/arbitration while the TPR did not change significantly. Other pairing strategies resulted in equal or worse performance than random pairing.

Conclusions: Logistic regression models accurately predicted screening mammography assessments and helped explore different radiologist pairing strategies. Pairing readers with similar modeled FPR characteristics reduced the number of examinations unnecessarily sent to consensus/arbitration without significantly compromising the TPR.

Highlights: A logistic-regression model can be derived that accurately predicts individual and paired reader performance during mammography screening reading.Pairing screening mammography radiologists with similar false-positive characteristics reduced false-positive rates with no significant loss in true positives and may reduce the number of examinations unnecessarily sent to consensus/arbitration.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medical Decision Making 医学-卫生保健

CiteScore

6.50

自引率

5.60%

发文量

146

审稿时长

6-12 weeks

期刊介绍： Medical Decision Making offers rigorous and systematic approaches to decision making that are designed to improve the health and clinical care of individuals and to assist with health care policy development. Using the fundamentals of decision analysis and theory, economic evaluation, and evidence based quality assessment, Medical Decision Making presents both theoretical and practical statistical and modeling techniques and methods from a variety of disciplines.