Libo Yang, Jie Chen, Leyi Gao, Fengling Li, Xudan Yang, Juan Ji, Pei Zhang, Ping Hua, Xiulan Liu, Rong Wang, Zhenru Wu, Fei Chen, Bing Wei, Zhang Zhang
{"title":"Artificial intelligence-assisted HER2 interpretation for breast cancers in a multi-laboratory study.","authors":"Libo Yang, Jie Chen, Leyi Gao, Fengling Li, Xudan Yang, Juan Ji, Pei Zhang, Ping Hua, Xiulan Liu, Rong Wang, Zhenru Wu, Fei Chen, Bing Wei, Zhang Zhang","doi":"10.21037/gs-2024-560","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Improving the concordance of human epidermal growth factor receptor 2 (HER2) examinations among laboratories remains a challenge. In this multi-laboratory study, we investigated the concordance of HER2 immunohistochemistry (IHC) examination through manual and artificial intelligence (AI)-assisted interpretation.</p><p><strong>Methods: </strong>A tissue microarray (TMA) comprising 53 breast cancer samples was constructed and distributed to 35 participating laboratories. For each sample on every slide, IHC scores of 0, 1+, 2+, and 3+ were recorded. Subsequently, cases that failed to achieve complete agreement during manual interpretation were re-evaluated using an AI-assisted microscope.</p><p><strong>Results: </strong>During manual interpretation, 14 out of 53 cases (14/53, 26.4%) demonstrated concordant results across all laboratories, including 13 IHC-0 cases and 1 IHC-3+ case. Notably, cases scored as 1+ in at least one laboratory exhibited a low overall percentage agreement (OPA) and Fleiss Kappa value. Among the 39 cases with non-concordant manual interpretation, 14 cases (14/39, 35.9%) achieved complete agreement through AI-assisted HER2 interpretation. In cases where manual interpretation discrepancies were restricted to scores of 0 and 1+, 69.6% (16/23) of the cases still showed differences between 0 and 1+ in AI-assisted HER2 interpretation. Disagreements between manual and AI-assisted interpretation occurred significantly more frequently in sections manually scored as 1+ compared to those scored as 0 (58.6% <i>vs</i>. 2.1%, P<0.001).</p><p><strong>Conclusions: </strong>The weakly staining phenotype leads to poor agreement in the manual interpretation of HER2 IHC-1+ breast cancers. AI-assisted HER2 interpretation offers a viable approach for multi-laboratory studies, effectively avoiding the subjective errors inherent in manual interpretation.</p>","PeriodicalId":12760,"journal":{"name":"Gland surgery","volume":"14 6","pages":"1042-1051"},"PeriodicalIF":1.6000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261348/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gland surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/gs-2024-560","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/26 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Improving the concordance of human epidermal growth factor receptor 2 (HER2) examinations among laboratories remains a challenge. In this multi-laboratory study, we investigated the concordance of HER2 immunohistochemistry (IHC) examination through manual and artificial intelligence (AI)-assisted interpretation.
Methods: A tissue microarray (TMA) comprising 53 breast cancer samples was constructed and distributed to 35 participating laboratories. For each sample on every slide, IHC scores of 0, 1+, 2+, and 3+ were recorded. Subsequently, cases that failed to achieve complete agreement during manual interpretation were re-evaluated using an AI-assisted microscope.
Results: During manual interpretation, 14 out of 53 cases (14/53, 26.4%) demonstrated concordant results across all laboratories, including 13 IHC-0 cases and 1 IHC-3+ case. Notably, cases scored as 1+ in at least one laboratory exhibited a low overall percentage agreement (OPA) and Fleiss Kappa value. Among the 39 cases with non-concordant manual interpretation, 14 cases (14/39, 35.9%) achieved complete agreement through AI-assisted HER2 interpretation. In cases where manual interpretation discrepancies were restricted to scores of 0 and 1+, 69.6% (16/23) of the cases still showed differences between 0 and 1+ in AI-assisted HER2 interpretation. Disagreements between manual and AI-assisted interpretation occurred significantly more frequently in sections manually scored as 1+ compared to those scored as 0 (58.6% vs. 2.1%, P<0.001).
Conclusions: The weakly staining phenotype leads to poor agreement in the manual interpretation of HER2 IHC-1+ breast cancers. AI-assisted HER2 interpretation offers a viable approach for multi-laboratory studies, effectively avoiding the subjective errors inherent in manual interpretation.
期刊介绍:
Gland Surgery (Gland Surg; GS, Print ISSN 2227-684X; Online ISSN 2227-8575) being indexed by PubMed/PubMed Central, is an open access, peer-review journal launched at May of 2012, published bio-monthly since February 2015.