{"title":"Generalizability of convolutional neural network-based model observer in breast tomosynthesis across volume glandular fractions and signal sizes.","authors":"Hanjoo Jang, Jongduk Baek","doi":"10.1002/mp.17725","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In our previous study, we proposed a convolutional neural network (CNN)-based model observer for signal known statistically (SKS) and background known statistically (BKS) tasks to assess the detection performance of breast tomosynthesis systems by varying acquisition angles and the number of projections at a constant dose level. Despite demonstrating the significant potential of the CNN-based model observer in approximating the ideal observer (IO) performance, further research is required to extend its applicability to clinically relevant tasks and validate its robustness across diverse imaging scenarios.</p><p><strong>Purpose: </strong>Exploring the generalizability of the CNN-based model observer is essential for advancing its practical utility in diagnostic imaging. In this work, we explored the generalizability of a CNN-based model observer for SKS and BKS detection tasks in breast tomosynthesis images with two different volume glandular fractions (i.e., VGFs: 30% and 50%), and two different sizes of spiculated signals (i.e., 1 and 2 mm). These efforts aim to provide deeper insights into the factors that influence network optimization for consistent and robust detection performance.</p><p><strong>Methods: </strong>Five different network architectures were used to verify whether optimizing the match between the receptive field (RF) size and signal size would enhance the detection performance; the networks were designed in terms of theoretical receptive field (TRF) size. The detection performance of the CNN-based model observer was compared to that of the Hotelling observer (HO) under various training and testing schemes to observe the key factors in optimizing the network to enhance its generalizability.</p><p><strong>Results: </strong>Throughout the study, we demonstrated that each network focuses more on discriminating the presence of similarly sized signals over achieving robustness to noise variations during training. The CNN-based model observer showed better detection performance compared to the HO, except when the trained and tested datasets incorporated differently sized signals. Networks trained on datasets involving signals of both sizes resulted in better generalizability compared to those trained on mixed-VGF datasets (i.e., datasets comprising both VGFs). Contrary to our assumption, the match between the TRF size and signal size did not improve the detection performance. This led to exploring the effective receptive field (ERF) size of the network as a descriptive metric of network generalizability, using pixelwise gradient activation mapping (pGrad-CAM). We showed that a relationship between the ERF size and signal size exists, thus presenting its clear relevance to the detection performance of the CNN-based model observer.</p><p><strong>Conclusions: </strong>Networks trained on datasets sharing similarly sized signals exhibited optimal detection performance, but showed limited generalizability when applied to datasets with signals of different sizes, while those trained on datasets involving signals of both sizes resulted in better generalizability. This work suggests that task-based exploration is crucial for designing CNN-based model observers that can perform and generalize consistently, providing valuable guidance for developing robust networks for varying imaging scenarios.</p>","PeriodicalId":94136,"journal":{"name":"Medical physics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/mp.17725","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: In our previous study, we proposed a convolutional neural network (CNN)-based model observer for signal known statistically (SKS) and background known statistically (BKS) tasks to assess the detection performance of breast tomosynthesis systems by varying acquisition angles and the number of projections at a constant dose level. Despite demonstrating the significant potential of the CNN-based model observer in approximating the ideal observer (IO) performance, further research is required to extend its applicability to clinically relevant tasks and validate its robustness across diverse imaging scenarios.
Purpose: Exploring the generalizability of the CNN-based model observer is essential for advancing its practical utility in diagnostic imaging. In this work, we explored the generalizability of a CNN-based model observer for SKS and BKS detection tasks in breast tomosynthesis images with two different volume glandular fractions (i.e., VGFs: 30% and 50%), and two different sizes of spiculated signals (i.e., 1 and 2 mm). These efforts aim to provide deeper insights into the factors that influence network optimization for consistent and robust detection performance.
Methods: Five different network architectures were used to verify whether optimizing the match between the receptive field (RF) size and signal size would enhance the detection performance; the networks were designed in terms of theoretical receptive field (TRF) size. The detection performance of the CNN-based model observer was compared to that of the Hotelling observer (HO) under various training and testing schemes to observe the key factors in optimizing the network to enhance its generalizability.
Results: Throughout the study, we demonstrated that each network focuses more on discriminating the presence of similarly sized signals over achieving robustness to noise variations during training. The CNN-based model observer showed better detection performance compared to the HO, except when the trained and tested datasets incorporated differently sized signals. Networks trained on datasets involving signals of both sizes resulted in better generalizability compared to those trained on mixed-VGF datasets (i.e., datasets comprising both VGFs). Contrary to our assumption, the match between the TRF size and signal size did not improve the detection performance. This led to exploring the effective receptive field (ERF) size of the network as a descriptive metric of network generalizability, using pixelwise gradient activation mapping (pGrad-CAM). We showed that a relationship between the ERF size and signal size exists, thus presenting its clear relevance to the detection performance of the CNN-based model observer.
Conclusions: Networks trained on datasets sharing similarly sized signals exhibited optimal detection performance, but showed limited generalizability when applied to datasets with signals of different sizes, while those trained on datasets involving signals of both sizes resulted in better generalizability. This work suggests that task-based exploration is crucial for designing CNN-based model observers that can perform and generalize consistently, providing valuable guidance for developing robust networks for varying imaging scenarios.