Systematic Analysis of Factors Affecting Human Epidermal Growth Factor Receptor 2 Interpretation Consistency: Staining Protocols, Artificial Intelligence–Based Image Standardization, and Classification Criteria

IF 5.1 2区医学 Q1 MEDICINE, RESEARCH & EXPERIMENTAL

Laboratory Investigation Pub Date : 2025-03-23 DOI:10.1016/j.labinv.2025.104134

Chen Jiang , Mei Li , Chengyou Zheng , Shumei Yan , Lingzhi Kong , Yu Wu , Jinhui Zhang , Xue Chao , Xi Cai , Wentai Feng , Jiehua He , Rongzhen Luo , Shuoyu Xu , Yuanzhong Yang , Peng Sun

{"title":"Systematic Analysis of Factors Affecting Human Epidermal Growth Factor Receptor 2 Interpretation Consistency: Staining Protocols, Artificial Intelligence–Based Image Standardization, and Classification Criteria","authors":"Chen Jiang , Mei Li , Chengyou Zheng , Shumei Yan , Lingzhi Kong , Yu Wu , Jinhui Zhang , Xue Chao , Xi Cai , Wentai Feng , Jiehua He , Rongzhen Luo , Shuoyu Xu , Yuanzhong Yang , Peng Sun","doi":"10.1016/j.labinv.2025.104134","DOIUrl":null,"url":null,"abstract":"<div><div>The efficacy of human epidermal growth factor receptor 2 (HER2)–targeting antibody-drug conjugates has underscored the critical need for precise HER2 diagnostics in breast cancer treatment. Despite the clinical importance, variability in immunohistochemical (IHC) staining protocols and interobserver inconsistencies challenge the reliability of HER2 status assessment, which is critical for guiding patient treatment strategies. To investigate the factors affecting HER2 interpretation consistency, tissue microarrays from 1063 breast carcinoma cases underwent 3 distinct IHC protocols, and a novel artificial intelligence (AI) model was developed to standardize HER2-stained images. A total of 5 sets of tissue microarrays (Nordi QC, protocol 1, protocol 2, protocol 1 AI, and protocol 2 AI) were independently reviewed by 8 pathologists. The Fleiss Kappa value and overall agreement rate measured interobserver agreement, with logistic regression analyzing the impact of variables on diagnostic accuracy. Our results showed that the Nordi QC protocol had the highest interobserver agreement (Kappa 0.754). AI-based image normalization notably enhanced consistency, particularly for HER2 low cases, aligning scores toward the Nordi QC standard. Logistic regression analysis indicated that both staining protocol and AI-based image standardization significantly influenced diagnostic accuracy (<em>P</em> < .001). The American Society of Clinical Oncology/College of American Pathologists 2018 binary criteria demonstrated the highest HER2 interobserver consistency (Kappa > 0.95). Compared with the American Society of Clinical Oncology/College of American Pathologists 2023 criteria, the newly proposed null, ultra-low/low, positive criteria, merging HER2 low and ultra-low categories, demonstrated improved reliability and agreement, especially in distinguishing the challenging HER2–ultra-low cases, which showed an exceedingly low interobserver agreement (Kappa < 0.20) across all protocols. Overall, variability in IHC staining protocols and HER2 classification criteria significantly affect the diagnostic consistency among pathologists. The integration of an AI model for image standardization and the adoption of the null, ultra-low/low, positive criteria may refine diagnostic precision and bolster clinical decision-making in breast cancer treatment.</div></div>","PeriodicalId":17930,"journal":{"name":"Laboratory Investigation","volume":"105 6","pages":"Article 104134"},"PeriodicalIF":5.1000,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Laboratory Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0023683725000443","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

The efficacy of human epidermal growth factor receptor 2 (HER2)–targeting antibody-drug conjugates has underscored the critical need for precise HER2 diagnostics in breast cancer treatment. Despite the clinical importance, variability in immunohistochemical (IHC) staining protocols and interobserver inconsistencies challenge the reliability of HER2 status assessment, which is critical for guiding patient treatment strategies. To investigate the factors affecting HER2 interpretation consistency, tissue microarrays from 1063 breast carcinoma cases underwent 3 distinct IHC protocols, and a novel artificial intelligence (AI) model was developed to standardize HER2-stained images. A total of 5 sets of tissue microarrays (Nordi QC, protocol 1, protocol 2, protocol 1 AI, and protocol 2 AI) were independently reviewed by 8 pathologists. The Fleiss Kappa value and overall agreement rate measured interobserver agreement, with logistic regression analyzing the impact of variables on diagnostic accuracy. Our results showed that the Nordi QC protocol had the highest interobserver agreement (Kappa 0.754). AI-based image normalization notably enhanced consistency, particularly for HER2 low cases, aligning scores toward the Nordi QC standard. Logistic regression analysis indicated that both staining protocol and AI-based image standardization significantly influenced diagnostic accuracy (P < .001). The American Society of Clinical Oncology/College of American Pathologists 2018 binary criteria demonstrated the highest HER2 interobserver consistency (Kappa > 0.95). Compared with the American Society of Clinical Oncology/College of American Pathologists 2023 criteria, the newly proposed null, ultra-low/low, positive criteria, merging HER2 low and ultra-low categories, demonstrated improved reliability and agreement, especially in distinguishing the challenging HER2–ultra-low cases, which showed an exceedingly low interobserver agreement (Kappa < 0.20) across all protocols. Overall, variability in IHC staining protocols and HER2 classification criteria significantly affect the diagnostic consistency among pathologists. The integration of an AI model for image standardization and the adoption of the null, ultra-low/low, positive criteria may refine diagnostic precision and bolster clinical decision-making in breast cancer treatment.

查看原文本刊更多论文

影响HER2判读一致性因素的系统分析：染色方案、基于人工智能的图像标准化和分类标准。

靶向HER2的抗体-药物偶联物（adc）的疗效强调了在乳腺癌治疗中精确诊断HER2的迫切需要。尽管具有临床重要性，但免疫组化（IHC）染色方案的可变性和观察者之间的不一致性挑战了HER2状态评估的可靠性，这对于指导患者的治疗策略至关重要。为了研究影响HER2解释一致性的因素，来自1063例乳腺癌病例的组织微阵列（tma）接受了三种不同的免疫组化方案，并开发了一种新的人工智能（AI）模型来标准化HER2染色图像。共有5套tma （Nordi QC, Protocol 1, Protocol 2, Protocol 1 AI, Protocol 2 AI）由8名病理学家独立审查。Fleiss Kappa值和总体一致性率测量了观察者之间的一致性，并使用逻辑回归分析变量对诊断准确性的影响。结果显示，Nordi QC协议具有最高的观察者间一致性（Kappa为0.754）。基于人工智能的图像归一化显著增强了一致性，特别是对于HER2低的病例，使分数与Nordi QC标准保持一致。Logistic回归分析显示，染色方案和基于人工智能的图像标准化均显著影响诊断准确性（p 0.95）。与ASCO/CAP 2023标准相比，新提出的合并HER2低和超低类别的NULP标准显示出更高的可靠性和一致性，特别是在区分具有挑战性的HER2超低病例时，所有协议的观察者间一致性极低（Kappa < 0.20）。总的来说，免疫组化染色方案和HER2分类标准的可变性显著影响了病理学家诊断的一致性。将图像标准化的人工智能模型与NULP标准的采用相结合，可以提高诊断精度，并加强乳腺癌治疗的临床决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Laboratory Investigation 医学-病理学

CiteScore

8.30

自引率

0.00%

发文量

125

审稿时长

2 months

期刊介绍： Laboratory Investigation is an international journal owned by the United States and Canadian Academy of Pathology. Laboratory Investigation offers prompt publication of high-quality original research in all biomedical disciplines relating to the understanding of human disease and the application of new methods to the diagnosis of disease. Both human and experimental studies are welcome.