Suzanne Parry, Lila Zabaglo, Abeer M Shaaban, Andrew Dodson
{"title":"Inter-rater agreement of HER2-low scores between expert breast pathologists and the Visiopharm digital image analysis application (HER2 APP, CE2797)","authors":"Suzanne Parry, Lila Zabaglo, Abeer M Shaaban, Andrew Dodson","doi":"10.1002/2056-4538.70051","DOIUrl":null,"url":null,"abstract":"<p>Inter-observer concordance data for the HER2 category as assessed by a group of 16 specialist breast pathologists on 50 diagnostic core biopsies was compared with that produced by digital image analysis (DIA) using the HER2 APP, CE2797 (VP APP; Visiopharm, Hoersholm, Denmark). Comparing pathologists' consensus scores and DIA scores, 36 cases (73.5%) agreed. Fleiss' kappa statistic was 0.433 (indicative of moderate agreement). Cohen's weighted kappa was used to compare the scores of individual raters to consensus scores; for all 50 cases the kappa scores had a range between 0.412 and 0.854; the VP APP was ranked 12th of 17 raters (kappa score 0.638 indicating substantial agreement). Results for HER2-low cases (<i>N</i> = 44) showed a kappa score range of 0.295 to 0.823; the VP APP ranked 12th of 17 (score 0.535 indicating moderate agreement). For high agreement cases the kappa score range was 0.664 to 1.000 for all HER2 scores (<i>N</i> = 24) and the VP APP scored 0.916 (indicating almost perfect agreement). For the HER2-low scores (<i>N</i> = 20), the kappa score range was 0.506–1.000 and the VP APP scored 0.860 (almost perfect agreement). DIA of the proportions of tumour cells showing expression within each of the HER2 categories demonstrated that the majority of cases showing a low level of agreement between pathologists showed heterogeneity and/or a level of expression close to a cut-point for decision making. This study demonstrates that the VP APP produces results that are extremely well-aligned to those of expert pathologists in cases with good overall agreement, and in difficult cases its reproducibility will outperform that of the visual scorer. The results also suggest that use of the VP APP has the potential to reduce the proportion of cases referred for gene amplification testing by reducing the number of cases incorrectly classified as HER2 2+.</p>","PeriodicalId":48612,"journal":{"name":"Journal of Pathology Clinical Research","volume":"11 6","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pathsocjournals.onlinelibrary.wiley.com/doi/epdf/10.1002/2056-4538.70051","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pathology Clinical Research","FirstCategoryId":"3","ListUrlMain":"https://pathsocjournals.onlinelibrary.wiley.com/doi/10.1002/2056-4538.70051","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Inter-observer concordance data for the HER2 category as assessed by a group of 16 specialist breast pathologists on 50 diagnostic core biopsies was compared with that produced by digital image analysis (DIA) using the HER2 APP, CE2797 (VP APP; Visiopharm, Hoersholm, Denmark). Comparing pathologists' consensus scores and DIA scores, 36 cases (73.5%) agreed. Fleiss' kappa statistic was 0.433 (indicative of moderate agreement). Cohen's weighted kappa was used to compare the scores of individual raters to consensus scores; for all 50 cases the kappa scores had a range between 0.412 and 0.854; the VP APP was ranked 12th of 17 raters (kappa score 0.638 indicating substantial agreement). Results for HER2-low cases (N = 44) showed a kappa score range of 0.295 to 0.823; the VP APP ranked 12th of 17 (score 0.535 indicating moderate agreement). For high agreement cases the kappa score range was 0.664 to 1.000 for all HER2 scores (N = 24) and the VP APP scored 0.916 (indicating almost perfect agreement). For the HER2-low scores (N = 20), the kappa score range was 0.506–1.000 and the VP APP scored 0.860 (almost perfect agreement). DIA of the proportions of tumour cells showing expression within each of the HER2 categories demonstrated that the majority of cases showing a low level of agreement between pathologists showed heterogeneity and/or a level of expression close to a cut-point for decision making. This study demonstrates that the VP APP produces results that are extremely well-aligned to those of expert pathologists in cases with good overall agreement, and in difficult cases its reproducibility will outperform that of the visual scorer. The results also suggest that use of the VP APP has the potential to reduce the proportion of cases referred for gene amplification testing by reducing the number of cases incorrectly classified as HER2 2+.
期刊介绍:
The Journal of Pathology: Clinical Research and The Journal of Pathology serve as translational bridges between basic biomedical science and clinical medicine with particular emphasis on, but not restricted to, tissue based studies.
The focus of The Journal of Pathology: Clinical Research is the publication of studies that illuminate the clinical relevance of research in the broad area of the study of disease. Appropriately powered and validated studies with novel diagnostic, prognostic and predictive significance, and biomarker discover and validation, will be welcomed. Studies with a predominantly mechanistic basis will be more appropriate for the companion Journal of Pathology.