Comparative study of intra- and inter-observer variability in manual scoring of HER2 immunohistochemical stains on glass slides versus paired digital images with emphasis on the low end of the expression spectrum.
Andrew Xiao, Poonam Vohra, Yunn-Yi Chen, Leah Ung, Mi-Ok Kim, Joseph Geradts
{"title":"Comparative study of intra- and inter-observer variability in manual scoring of HER2 immunohistochemical stains on glass slides versus paired digital images with emphasis on the low end of the expression spectrum.","authors":"Andrew Xiao, Poonam Vohra, Yunn-Yi Chen, Leah Ung, Mi-Ok Kim, Joseph Geradts","doi":"10.1016/j.humpath.2025.105860","DOIUrl":null,"url":null,"abstract":"<p><p>With the advent of new therapeutic agents showing efficacy in human breast cancers with low levels of the HER2 oncoprotein, it has become important for pathologists to accurately categorize HER2 expression at the low end of the spectrum. At the same time, an increasing number of pathology laboratories are transitioning to a digital workflow. Our study was primarily designed to define inter-observer variability in manual scoring of HER2 stains and to investigate any differences in scoring of glass slides versus paired digital images. We studied 247 breast carcinomas including 117 core biopsies and 130 excisional specimens. Tumors with a HER2 score of 0 were oversampled (n=100) and sub-classified as \"null\" and \"ultralow\". Inter-observer agreement was high among three experienced breast pathologists (kappa = 0.82-0.87). Intra-observer agreement for scoring glass slides versus paired digital images also was near perfect (kappa = 0.89-0.98). Discordant reads were noted in 10.1% of slide/image pairs, and in the majority of cases, digital image scores were higher. Most discordances were observed among null and ultralow cases. Consensus scoring of digital images yielded fewer null and more 1+ scores compared to glass slides. Between 25% and 48% of cases with a clinically reported HER2 score of 0 were sub-classified as null. Our study demonstrates that a high level of inter-observer agreement in manual HER2 scoring is achievable, even at the low end of the expression spectrum. Importantly, glass slide and image reads were largely concordant, but digital image scoring may be more sensitive at low immunohistochemical staining levels.</p>","PeriodicalId":13062,"journal":{"name":"Human pathology","volume":" ","pages":"105860"},"PeriodicalIF":2.7000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human pathology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.humpath.2025.105860","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
With the advent of new therapeutic agents showing efficacy in human breast cancers with low levels of the HER2 oncoprotein, it has become important for pathologists to accurately categorize HER2 expression at the low end of the spectrum. At the same time, an increasing number of pathology laboratories are transitioning to a digital workflow. Our study was primarily designed to define inter-observer variability in manual scoring of HER2 stains and to investigate any differences in scoring of glass slides versus paired digital images. We studied 247 breast carcinomas including 117 core biopsies and 130 excisional specimens. Tumors with a HER2 score of 0 were oversampled (n=100) and sub-classified as "null" and "ultralow". Inter-observer agreement was high among three experienced breast pathologists (kappa = 0.82-0.87). Intra-observer agreement for scoring glass slides versus paired digital images also was near perfect (kappa = 0.89-0.98). Discordant reads were noted in 10.1% of slide/image pairs, and in the majority of cases, digital image scores were higher. Most discordances were observed among null and ultralow cases. Consensus scoring of digital images yielded fewer null and more 1+ scores compared to glass slides. Between 25% and 48% of cases with a clinically reported HER2 score of 0 were sub-classified as null. Our study demonstrates that a high level of inter-observer agreement in manual HER2 scoring is achievable, even at the low end of the expression spectrum. Importantly, glass slide and image reads were largely concordant, but digital image scoring may be more sensitive at low immunohistochemical staining levels.
期刊介绍:
Human Pathology is designed to bring information of clinicopathologic significance to human disease to the laboratory and clinical physician. It presents information drawn from morphologic and clinical laboratory studies with direct relevance to the understanding of human diseases. Papers published concern morphologic and clinicopathologic observations, reviews of diseases, analyses of problems in pathology, significant collections of case material and advances in concepts or techniques of value in the analysis and diagnosis of disease. Theoretical and experimental pathology and molecular biology pertinent to human disease are included. This critical journal is well illustrated with exceptional reproductions of photomicrographs and microscopic anatomy.