Jessie J J Gommers, Craig K Abbey, Fredrik Strand, Sian Taylor-Phillips, David J Jenkinson, Marthe Larsen, Solveig Hofvind, Mireille J M Broeders, Ioannis Sechopoulos
{"title":"建立放射医师评估模型,探索优化乳腺 X 光筛查双读的配对策略。","authors":"Jessie J J Gommers, Craig K Abbey, Fredrik Strand, Sian Taylor-Phillips, David J Jenkinson, Marthe Larsen, Solveig Hofvind, Mireille J M Broeders, Ioannis Sechopoulos","doi":"10.1177/0272989X241264572","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To develop a model that simulates radiologist assessments and use it to explore whether pairing readers based on their individual performance characteristics could optimize screening performance.</p><p><strong>Methods: </strong>Logistic regression models were designed and used to model individual radiologist assessments. For model evaluation, model-predicted individual performance metrics and paired disagreement rates were compared against the observed data using Pearson correlation coefficients. The logistic regression models were subsequently used to simulate different screening programs with reader pairing based on individual true-positive rates (TPR) and/or false-positive rates (FPR). For this, retrospective results from breast cancer screening programs employing double reading in Sweden, England, and Norway were used. Outcomes of random pairing were compared against those composed of readers with similar and opposite TPRs/FPRs, with positive assessments defined by either reader flagging an examination as abnormal.</p><p><strong>Results: </strong>The analysis data sets consisted of 936,621 (Sweden), 435,281 (England), and 1,820,053 (Norway) examinations. There was good agreement between the model-predicted and observed radiologists' TPR and FPR (<i>r</i> ≥ 0.969). Model-predicted negative-case disagreement rates showed high correlations (<i>r</i> ≥ 0.709), whereas positive-case disagreement rates had lower correlation levels due to sparse data (<i>r</i> ≥ 0.532). Pairing radiologists with similar FPR characteristics (Sweden: 4.50% [95% confidence interval: 4.46%-4.54%], England: 5.51% [5.47%-5.56%], Norway: 8.03% [7.99%-8.07%]) resulted in significantly lower FPR than with random pairing (Sweden: 4.74% [4.70%-4.78%], England: 5.76% [5.71%-5.80%], Norway: 8.30% [8.26%-8.34%]), reducing examinations sent to consensus/arbitration while the TPR did not change significantly. Other pairing strategies resulted in equal or worse performance than random pairing.</p><p><strong>Conclusions: </strong>Logistic regression models accurately predicted screening mammography assessments and helped explore different radiologist pairing strategies. Pairing readers with similar modeled FPR characteristics reduced the number of examinations unnecessarily sent to consensus/arbitration without significantly compromising the TPR.</p><p><strong>Highlights: </strong>A logistic-regression model can be derived that accurately predicts individual and paired reader performance during mammography screening reading.Pairing screening mammography radiologists with similar false-positive characteristics reduced false-positive rates with no significant loss in true positives and may reduce the number of examinations unnecessarily sent to consensus/arbitration.</p>","PeriodicalId":49839,"journal":{"name":"Medical Decision Making","volume":" ","pages":"828-842"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11490068/pdf/","citationCount":"0","resultStr":"{\"title\":\"Modeling Radiologists' Assessments to Explore Pairing Strategies for Optimized Double Reading of Screening Mammograms.\",\"authors\":\"Jessie J J Gommers, Craig K Abbey, Fredrik Strand, Sian Taylor-Phillips, David J Jenkinson, Marthe Larsen, Solveig Hofvind, Mireille J M Broeders, Ioannis Sechopoulos\",\"doi\":\"10.1177/0272989X241264572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>To develop a model that simulates radiologist assessments and use it to explore whether pairing readers based on their individual performance characteristics could optimize screening performance.</p><p><strong>Methods: </strong>Logistic regression models were designed and used to model individual radiologist assessments. For model evaluation, model-predicted individual performance metrics and paired disagreement rates were compared against the observed data using Pearson correlation coefficients. The logistic regression models were subsequently used to simulate different screening programs with reader pairing based on individual true-positive rates (TPR) and/or false-positive rates (FPR). For this, retrospective results from breast cancer screening programs employing double reading in Sweden, England, and Norway were used. Outcomes of random pairing were compared against those composed of readers with similar and opposite TPRs/FPRs, with positive assessments defined by either reader flagging an examination as abnormal.</p><p><strong>Results: </strong>The analysis data sets consisted of 936,621 (Sweden), 435,281 (England), and 1,820,053 (Norway) examinations. There was good agreement between the model-predicted and observed radiologists' TPR and FPR (<i>r</i> ≥ 0.969). Model-predicted negative-case disagreement rates showed high correlations (<i>r</i> ≥ 0.709), whereas positive-case disagreement rates had lower correlation levels due to sparse data (<i>r</i> ≥ 0.532). Pairing radiologists with similar FPR characteristics (Sweden: 4.50% [95% confidence interval: 4.46%-4.54%], England: 5.51% [5.47%-5.56%], Norway: 8.03% [7.99%-8.07%]) resulted in significantly lower FPR than with random pairing (Sweden: 4.74% [4.70%-4.78%], England: 5.76% [5.71%-5.80%], Norway: 8.30% [8.26%-8.34%]), reducing examinations sent to consensus/arbitration while the TPR did not change significantly. Other pairing strategies resulted in equal or worse performance than random pairing.</p><p><strong>Conclusions: </strong>Logistic regression models accurately predicted screening mammography assessments and helped explore different radiologist pairing strategies. Pairing readers with similar modeled FPR characteristics reduced the number of examinations unnecessarily sent to consensus/arbitration without significantly compromising the TPR.</p><p><strong>Highlights: </strong>A logistic-regression model can be derived that accurately predicts individual and paired reader performance during mammography screening reading.Pairing screening mammography radiologists with similar false-positive characteristics reduced false-positive rates with no significant loss in true positives and may reduce the number of examinations unnecessarily sent to consensus/arbitration.</p>\",\"PeriodicalId\":49839,\"journal\":{\"name\":\"Medical Decision Making\",\"volume\":\" \",\"pages\":\"828-842\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11490068/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/0272989X241264572\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/7/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/0272989X241264572","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Modeling Radiologists' Assessments to Explore Pairing Strategies for Optimized Double Reading of Screening Mammograms.
Purpose: To develop a model that simulates radiologist assessments and use it to explore whether pairing readers based on their individual performance characteristics could optimize screening performance.
Methods: Logistic regression models were designed and used to model individual radiologist assessments. For model evaluation, model-predicted individual performance metrics and paired disagreement rates were compared against the observed data using Pearson correlation coefficients. The logistic regression models were subsequently used to simulate different screening programs with reader pairing based on individual true-positive rates (TPR) and/or false-positive rates (FPR). For this, retrospective results from breast cancer screening programs employing double reading in Sweden, England, and Norway were used. Outcomes of random pairing were compared against those composed of readers with similar and opposite TPRs/FPRs, with positive assessments defined by either reader flagging an examination as abnormal.
Results: The analysis data sets consisted of 936,621 (Sweden), 435,281 (England), and 1,820,053 (Norway) examinations. There was good agreement between the model-predicted and observed radiologists' TPR and FPR (r ≥ 0.969). Model-predicted negative-case disagreement rates showed high correlations (r ≥ 0.709), whereas positive-case disagreement rates had lower correlation levels due to sparse data (r ≥ 0.532). Pairing radiologists with similar FPR characteristics (Sweden: 4.50% [95% confidence interval: 4.46%-4.54%], England: 5.51% [5.47%-5.56%], Norway: 8.03% [7.99%-8.07%]) resulted in significantly lower FPR than with random pairing (Sweden: 4.74% [4.70%-4.78%], England: 5.76% [5.71%-5.80%], Norway: 8.30% [8.26%-8.34%]), reducing examinations sent to consensus/arbitration while the TPR did not change significantly. Other pairing strategies resulted in equal or worse performance than random pairing.
Conclusions: Logistic regression models accurately predicted screening mammography assessments and helped explore different radiologist pairing strategies. Pairing readers with similar modeled FPR characteristics reduced the number of examinations unnecessarily sent to consensus/arbitration without significantly compromising the TPR.
Highlights: A logistic-regression model can be derived that accurately predicts individual and paired reader performance during mammography screening reading.Pairing screening mammography radiologists with similar false-positive characteristics reduced false-positive rates with no significant loss in true positives and may reduce the number of examinations unnecessarily sent to consensus/arbitration.
期刊介绍:
Medical Decision Making offers rigorous and systematic approaches to decision making that are designed to improve the health and clinical care of individuals and to assist with health care policy development. Using the fundamentals of decision analysis and theory, economic evaluation, and evidence based quality assessment, Medical Decision Making presents both theoretical and practical statistical and modeling techniques and methods from a variety of disciplines.