{"title":"一项多队列研究:迁移学习驱动he染色乳腺癌wsi的自动HER2评分","authors":"Xiaoping Li, Zhiquan Lin, Chaoran Qiu, Yiwen Zhang, Chuqian Lei, Shaofei Shen, Weibin Zhang, Chan Lai, Weiwen Li, Hui Huang, Tian Qiu","doi":"10.1186/s13058-025-02008-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Streamlining the clinical procedure of human epidermal growth factor receptor 2 (HER2) examination is challenging. Previous studies neglected the intra-class variability within both HER2-positive and -negative groups and lacked multi-cohort validation. To address this deficiency, this study collected data from multiple cohorts to develop a robust model for HER2 scoring utilizing only Hematoxylin&Eosin-stained whole slide images (WSIs).</p><p><strong>Methods: </strong>A total of 578 WSIs were collected from five cohorts, including three public and two private datasets. Each WSI underwent adaptive scale cropping. The transfer-learning-based probabilistic aggregation (TL-PA) model and multi-instance learning (MIL)-based models were compared, both of which were trained on Cohort A and validated on Cohorts B-D. The model demonstrating superior performance was further evaluated in the neoadjuvant therapy (NAT) cohort. Scoring performance was assessed using the area under the receiver operating characteristic curve (AUC). Correlation between the model scores and specific grades (HER2 levels, pathological complete response (pCR) status, residual cancer burden (RCB) grades) were evaluated using Spearman rank correlation and Dunn's test. Patch analysis was performed with manually defined features.</p><p><strong>Results: </strong>For HER2 scoring, the TL-PA significantly outperformed the MIL-based models, achieving robust AUCs in four validation cohorts (Cohort A: 0.75, Cohort B: 0.75, Cohort C: 0.77, Cohort D: 0.77). Correlation analysis confirmed a moderate association between model scores and manual reader-defined HER2-IHC status (Coefficient<sub>(Spearman)</sub> = 0.37, P<sub>(Spearman)</sub> = 0.001) as well as RCB grades (Coefficient<sub>(Spearman)</sub> = 0.45, P<sub>(Spearman)</sub> = 0.0006). In Cohort NAT, with the non-pCR as the positive control, the AUC was 0.77. Patch analysis revealed a core-to-peritumoral probability decrease pattern as malignancy spread outward from the lesion's core.</p><p><strong>Conclusion: </strong>TL-PA shows robust generalization for HER2 scoring with minimal data; however, it still inadequately capture intra-class variability. This indicates that future deep-learning endeavors should incorporate more detailed annotations to better align the model's focus with the reasoning of pathologists.</p>","PeriodicalId":49227,"journal":{"name":"Breast Cancer Research","volume":"27 1","pages":"62"},"PeriodicalIF":7.4000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12020254/pdf/","citationCount":"0","resultStr":"{\"title\":\"Transfer learning drives automatic HER2 scoring on HE-stained WSIs for breast cancer: a multi-cohort study.\",\"authors\":\"Xiaoping Li, Zhiquan Lin, Chaoran Qiu, Yiwen Zhang, Chuqian Lei, Shaofei Shen, Weibin Zhang, Chan Lai, Weiwen Li, Hui Huang, Tian Qiu\",\"doi\":\"10.1186/s13058-025-02008-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Streamlining the clinical procedure of human epidermal growth factor receptor 2 (HER2) examination is challenging. Previous studies neglected the intra-class variability within both HER2-positive and -negative groups and lacked multi-cohort validation. To address this deficiency, this study collected data from multiple cohorts to develop a robust model for HER2 scoring utilizing only Hematoxylin&Eosin-stained whole slide images (WSIs).</p><p><strong>Methods: </strong>A total of 578 WSIs were collected from five cohorts, including three public and two private datasets. Each WSI underwent adaptive scale cropping. The transfer-learning-based probabilistic aggregation (TL-PA) model and multi-instance learning (MIL)-based models were compared, both of which were trained on Cohort A and validated on Cohorts B-D. The model demonstrating superior performance was further evaluated in the neoadjuvant therapy (NAT) cohort. Scoring performance was assessed using the area under the receiver operating characteristic curve (AUC). Correlation between the model scores and specific grades (HER2 levels, pathological complete response (pCR) status, residual cancer burden (RCB) grades) were evaluated using Spearman rank correlation and Dunn's test. Patch analysis was performed with manually defined features.</p><p><strong>Results: </strong>For HER2 scoring, the TL-PA significantly outperformed the MIL-based models, achieving robust AUCs in four validation cohorts (Cohort A: 0.75, Cohort B: 0.75, Cohort C: 0.77, Cohort D: 0.77). Correlation analysis confirmed a moderate association between model scores and manual reader-defined HER2-IHC status (Coefficient<sub>(Spearman)</sub> = 0.37, P<sub>(Spearman)</sub> = 0.001) as well as RCB grades (Coefficient<sub>(Spearman)</sub> = 0.45, P<sub>(Spearman)</sub> = 0.0006). In Cohort NAT, with the non-pCR as the positive control, the AUC was 0.77. Patch analysis revealed a core-to-peritumoral probability decrease pattern as malignancy spread outward from the lesion's core.</p><p><strong>Conclusion: </strong>TL-PA shows robust generalization for HER2 scoring with minimal data; however, it still inadequately capture intra-class variability. This indicates that future deep-learning endeavors should incorporate more detailed annotations to better align the model's focus with the reasoning of pathologists.</p>\",\"PeriodicalId\":49227,\"journal\":{\"name\":\"Breast Cancer Research\",\"volume\":\"27 1\",\"pages\":\"62\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12020254/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Breast Cancer Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s13058-025-02008-7\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Breast Cancer Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13058-025-02008-7","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
Transfer learning drives automatic HER2 scoring on HE-stained WSIs for breast cancer: a multi-cohort study.
Background: Streamlining the clinical procedure of human epidermal growth factor receptor 2 (HER2) examination is challenging. Previous studies neglected the intra-class variability within both HER2-positive and -negative groups and lacked multi-cohort validation. To address this deficiency, this study collected data from multiple cohorts to develop a robust model for HER2 scoring utilizing only Hematoxylin&Eosin-stained whole slide images (WSIs).
Methods: A total of 578 WSIs were collected from five cohorts, including three public and two private datasets. Each WSI underwent adaptive scale cropping. The transfer-learning-based probabilistic aggregation (TL-PA) model and multi-instance learning (MIL)-based models were compared, both of which were trained on Cohort A and validated on Cohorts B-D. The model demonstrating superior performance was further evaluated in the neoadjuvant therapy (NAT) cohort. Scoring performance was assessed using the area under the receiver operating characteristic curve (AUC). Correlation between the model scores and specific grades (HER2 levels, pathological complete response (pCR) status, residual cancer burden (RCB) grades) were evaluated using Spearman rank correlation and Dunn's test. Patch analysis was performed with manually defined features.
Results: For HER2 scoring, the TL-PA significantly outperformed the MIL-based models, achieving robust AUCs in four validation cohorts (Cohort A: 0.75, Cohort B: 0.75, Cohort C: 0.77, Cohort D: 0.77). Correlation analysis confirmed a moderate association between model scores and manual reader-defined HER2-IHC status (Coefficient(Spearman) = 0.37, P(Spearman) = 0.001) as well as RCB grades (Coefficient(Spearman) = 0.45, P(Spearman) = 0.0006). In Cohort NAT, with the non-pCR as the positive control, the AUC was 0.77. Patch analysis revealed a core-to-peritumoral probability decrease pattern as malignancy spread outward from the lesion's core.
Conclusion: TL-PA shows robust generalization for HER2 scoring with minimal data; however, it still inadequately capture intra-class variability. This indicates that future deep-learning endeavors should incorporate more detailed annotations to better align the model's focus with the reasoning of pathologists.
期刊介绍:
Breast Cancer Research, an international, peer-reviewed online journal, publishes original research, reviews, editorials, and reports. It features open-access research articles of exceptional interest across all areas of biology and medicine relevant to breast cancer. This includes normal mammary gland biology, with a special emphasis on the genetic, biochemical, and cellular basis of breast cancer. In addition to basic research, the journal covers preclinical, translational, and clinical studies with a biological basis, including Phase I and Phase II trials.