Chen Jiang, Mei Li, Chengyou Zheng, Shumei Yan, Lingzhi Kong, Yu Wu, Jinhui Zhang, Xue Chao, Xi Cai, Wentai Feng, Jiehua He, Rongzhen Luo, Shuoyu Xu, Yuanzhong Yang, Peng Sun
{"title":"Systematic Analysis of Factors Affecting HER2 Interpretation Consistency: Staining Protocols, AI-Based Image Standardization, and Classification Criteria.","authors":"Chen Jiang, Mei Li, Chengyou Zheng, Shumei Yan, Lingzhi Kong, Yu Wu, Jinhui Zhang, Xue Chao, Xi Cai, Wentai Feng, Jiehua He, Rongzhen Luo, Shuoyu Xu, Yuanzhong Yang, Peng Sun","doi":"10.1016/j.labinv.2025.104134","DOIUrl":null,"url":null,"abstract":"<p><p>The efficacy of HER2-targeting antibody-drug conjugates (ADCs) has underscored the critical need for precise HER2 diagnostics in breast cancer treatment. Despite the clinical importance, variability in immunohistochemical (IHC) staining protocols and inter-observer inconsistencies challenge the reliability of HER2 status assessment, which is critical for guiding patient treatment strategies. To investigate the factors affecting HER2 interpretation consistency, tissue microarrays (TMAs) from 1063 breast carcinoma cases underwent three distinct IHC protocols, and a novel artificial intelligence (AI) model was developed to standardize HER2-stained images. A total of five sets of TMAs (Nordi QC, Protocol 1, Protocol 2, Protocol 1 AI, Protocol 2 AI) were independently reviewed by eight pathologists. The Fleiss Kappa value and overall agreement rate measured inter-observer agreement, with logistic regression analyzing the impact of variables on diagnostic accuracy. Our results showed that the Nordi QC protocol had the highest inter-observer agreement (Kappa 0.754). AI-based image normalization notably enhanced consistency, particularly for HER2 low cases, aligning scores towards the Nordi QC standard. Logistic regression analysis indicated that both staining protocol and AI-based image standardization significantly influenced diagnostic accuracy (p<0.001). The ASCO/CAP 2018 binary criteria demonstrated the highest HER2 inter-observer consistency (Kappa > 0.95). Compared to the ASCO/CAP 2023 criteria, the newly proposed NULP criteria, merging HER2 low and ultra-low categories, demonstrated improved reliability and agreement, especially in distinguishing the challenging HER2 ultra-low cases, which showed an exceedingly low inter-observer agreement (Kappa < 0.20) across all protocols. Overall, variability in IHC staining protocols and HER2 classification criteria significantly affect the diagnostic consistency among pathologists. The integration of an AI model for image standardization and the adoption of the NULP criteria may refine diagnostic precision and bolster clinical decision-making in breast cancer treatment.</p>","PeriodicalId":17930,"journal":{"name":"Laboratory Investigation","volume":" ","pages":"104134"},"PeriodicalIF":5.1000,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Laboratory Investigation","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.labinv.2025.104134","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
The efficacy of HER2-targeting antibody-drug conjugates (ADCs) has underscored the critical need for precise HER2 diagnostics in breast cancer treatment. Despite the clinical importance, variability in immunohistochemical (IHC) staining protocols and inter-observer inconsistencies challenge the reliability of HER2 status assessment, which is critical for guiding patient treatment strategies. To investigate the factors affecting HER2 interpretation consistency, tissue microarrays (TMAs) from 1063 breast carcinoma cases underwent three distinct IHC protocols, and a novel artificial intelligence (AI) model was developed to standardize HER2-stained images. A total of five sets of TMAs (Nordi QC, Protocol 1, Protocol 2, Protocol 1 AI, Protocol 2 AI) were independently reviewed by eight pathologists. The Fleiss Kappa value and overall agreement rate measured inter-observer agreement, with logistic regression analyzing the impact of variables on diagnostic accuracy. Our results showed that the Nordi QC protocol had the highest inter-observer agreement (Kappa 0.754). AI-based image normalization notably enhanced consistency, particularly for HER2 low cases, aligning scores towards the Nordi QC standard. Logistic regression analysis indicated that both staining protocol and AI-based image standardization significantly influenced diagnostic accuracy (p<0.001). The ASCO/CAP 2018 binary criteria demonstrated the highest HER2 inter-observer consistency (Kappa > 0.95). Compared to the ASCO/CAP 2023 criteria, the newly proposed NULP criteria, merging HER2 low and ultra-low categories, demonstrated improved reliability and agreement, especially in distinguishing the challenging HER2 ultra-low cases, which showed an exceedingly low inter-observer agreement (Kappa < 0.20) across all protocols. Overall, variability in IHC staining protocols and HER2 classification criteria significantly affect the diagnostic consistency among pathologists. The integration of an AI model for image standardization and the adoption of the NULP criteria may refine diagnostic precision and bolster clinical decision-making in breast cancer treatment.
期刊介绍:
Laboratory Investigation is an international journal owned by the United States and Canadian Academy of Pathology. Laboratory Investigation offers prompt publication of high-quality original research in all biomedical disciplines relating to the understanding of human disease and the application of new methods to the diagnosis of disease. Both human and experimental studies are welcome.