{"title":"多阶段合作学习联合增强胸部x线自动诊断与放射学凝视预测","authors":"Zirui Qiu, Hassan Rivaz, Yiming Xiao","doi":"10.1002/mp.17977","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>As visual inspection is an inherent process during radiological screening, the associated eye gaze data can provide valuable insights into relevant clinical decision processes and facilitate computer-assisted diagnosis. However, the relevant techniques are still under-explored.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>With deep learning becoming the state-of-the-art for computer-assisted diagnosis, integrating human behavior, such as eye gaze data, into these systems is instrumental to help guide machine predictions with clinical diagnostic criteria, thus enhancing the quality of automatic radiological diagnosis. In addition, the ability to predict a radiologist's gaze saliency from a clinical scan along with the automatic diagnostic result could be instrumental for the end users.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We propose a novel deep learning framework for joint disease diagnosis and prediction of corresponding radiological gaze saliency maps for chest x-ray scans. Specifically, we introduce a new dual-encoder multitask UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder to extract diverse features for visual saliency map prediction and a multiscale feature-fusion classifier to perform disease classification. To tackle the issue of asynchronous training schedules of individual tasks in multitask learning, we propose a multistage cooperative learning strategy, with contrastive learning for feature encoder pretraining to boost performance.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Our proposed method is shown to significantly outperform existing techniques for chest radiography diagnosis (AUC = 0.93) and the quality of visual saliency map prediction (correlation coefficient = 0.58).</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Benefiting from the proposed multitask, multistage cooperative learning, our technique demonstrates the benefit of integrating clinicians' eye gaze into radiological AI systems to boost performance and potentially explainability.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 7","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mp.17977","citationCount":"0","resultStr":"{\"title\":\"Joint enhancement of automatic chest x-ray diagnosis and radiological gaze prediction with multistage cooperative learning\",\"authors\":\"Zirui Qiu, Hassan Rivaz, Yiming Xiao\",\"doi\":\"10.1002/mp.17977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>As visual inspection is an inherent process during radiological screening, the associated eye gaze data can provide valuable insights into relevant clinical decision processes and facilitate computer-assisted diagnosis. However, the relevant techniques are still under-explored.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Purpose</h3>\\n \\n <p>With deep learning becoming the state-of-the-art for computer-assisted diagnosis, integrating human behavior, such as eye gaze data, into these systems is instrumental to help guide machine predictions with clinical diagnostic criteria, thus enhancing the quality of automatic radiological diagnosis. In addition, the ability to predict a radiologist's gaze saliency from a clinical scan along with the automatic diagnostic result could be instrumental for the end users.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>We propose a novel deep learning framework for joint disease diagnosis and prediction of corresponding radiological gaze saliency maps for chest x-ray scans. Specifically, we introduce a new dual-encoder multitask UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder to extract diverse features for visual saliency map prediction and a multiscale feature-fusion classifier to perform disease classification. To tackle the issue of asynchronous training schedules of individual tasks in multitask learning, we propose a multistage cooperative learning strategy, with contrastive learning for feature encoder pretraining to boost performance.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Our proposed method is shown to significantly outperform existing techniques for chest radiography diagnosis (AUC = 0.93) and the quality of visual saliency map prediction (correlation coefficient = 0.58).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>Benefiting from the proposed multitask, multistage cooperative learning, our technique demonstrates the benefit of integrating clinicians' eye gaze into radiological AI systems to boost performance and potentially explainability.</p>\\n </section>\\n </div>\",\"PeriodicalId\":18384,\"journal\":{\"name\":\"Medical physics\",\"volume\":\"52 7\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mp.17977\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical physics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/mp.17977\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mp.17977","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Joint enhancement of automatic chest x-ray diagnosis and radiological gaze prediction with multistage cooperative learning
Background
As visual inspection is an inherent process during radiological screening, the associated eye gaze data can provide valuable insights into relevant clinical decision processes and facilitate computer-assisted diagnosis. However, the relevant techniques are still under-explored.
Purpose
With deep learning becoming the state-of-the-art for computer-assisted diagnosis, integrating human behavior, such as eye gaze data, into these systems is instrumental to help guide machine predictions with clinical diagnostic criteria, thus enhancing the quality of automatic radiological diagnosis. In addition, the ability to predict a radiologist's gaze saliency from a clinical scan along with the automatic diagnostic result could be instrumental for the end users.
Methods
We propose a novel deep learning framework for joint disease diagnosis and prediction of corresponding radiological gaze saliency maps for chest x-ray scans. Specifically, we introduce a new dual-encoder multitask UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder to extract diverse features for visual saliency map prediction and a multiscale feature-fusion classifier to perform disease classification. To tackle the issue of asynchronous training schedules of individual tasks in multitask learning, we propose a multistage cooperative learning strategy, with contrastive learning for feature encoder pretraining to boost performance.
Results
Our proposed method is shown to significantly outperform existing techniques for chest radiography diagnosis (AUC = 0.93) and the quality of visual saliency map prediction (correlation coefficient = 0.58).
Conclusion
Benefiting from the proposed multitask, multistage cooperative learning, our technique demonstrates the benefit of integrating clinicians' eye gaze into radiological AI systems to boost performance and potentially explainability.
期刊介绍:
Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments
Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.