Laura Verzellesi , Moira Ragazzi , Andrea Botti , Giacomo Santandrea , Andrew Janowczyk , Luca Bottazzi , Alessandra Bisagni , Ione Tamagnini , Giorgio Gardini , Saverio Coiro , Elisa Gasparini , Mauro Iori
{"title":"Using HistoQC to predict disagreement on human epidermal growth factor receptor 2 (HER2) assessment","authors":"Laura Verzellesi , Moira Ragazzi , Andrea Botti , Giacomo Santandrea , Andrew Janowczyk , Luca Bottazzi , Alessandra Bisagni , Ione Tamagnini , Giorgio Gardini , Saverio Coiro , Elisa Gasparini , Mauro Iori","doi":"10.1016/j.ejmp.2025.105066","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The human epidermal growth factor receptor 2 (HER2) gene is a significant prognostic and predictive factor for breast cancer therapy response. HER2 assessment is critical for targeted therapy eligibility, but interobserver reproducibility is a well-known issue in HER2 evaluation.</div></div><div><h3>Purpose</h3><div>The goal of our study is to create a machine learning (ML) system able to detect whole slide images (WSIs) that might cause discrepancies among observers.</div></div><div><h3>Methods</h3><div>We collected 132 pathology slides with double-blind HER2 evaluation and defined the agreement between observers as a binary classification: 0 for disagreement and 1 for agreement. We utilized HistoQC software to analyze and characterize the pathology slides based on a series of quality-related features. HistoQC-derived quality metrics were used to train a machine learning model (XGBoost) to predict interobserver disagreement. The dataset was randomly split into training and testing at proportions of 60%/40%, respectively.</div></div><div><h3>Results</h3><div>Our model demonstrated a mean AUC of 0.86 (standard deviation, SD = 0.09) across five cross-validation runs on the training set, highlighting its predictive reliability. The AUC on the testing set was 0.81 (confidence interval, CI = [0.82–0.94]), emphasizing the model’s precision in predicting whether an unseen WSI would lead to discordance.</div></div><div><h3>Conclusions</h3><div>Our study presents a machine learning model built to identify potential diagnostic disagreements in HER2 pathology evaluations. The results demonstrate a correlation between the quality of pathology slides and diagnostic outcomes. Upon proper validation, our tool could be integrated among the existing quality assurance systems used in anatomic pathology departments to improve HER2 diagnostic process.</div></div>","PeriodicalId":56092,"journal":{"name":"Physica Medica-European Journal of Medical Physics","volume":"137 ","pages":"Article 105066"},"PeriodicalIF":2.7000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physica Medica-European Journal of Medical Physics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1120179725001760","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background
The human epidermal growth factor receptor 2 (HER2) gene is a significant prognostic and predictive factor for breast cancer therapy response. HER2 assessment is critical for targeted therapy eligibility, but interobserver reproducibility is a well-known issue in HER2 evaluation.
Purpose
The goal of our study is to create a machine learning (ML) system able to detect whole slide images (WSIs) that might cause discrepancies among observers.
Methods
We collected 132 pathology slides with double-blind HER2 evaluation and defined the agreement between observers as a binary classification: 0 for disagreement and 1 for agreement. We utilized HistoQC software to analyze and characterize the pathology slides based on a series of quality-related features. HistoQC-derived quality metrics were used to train a machine learning model (XGBoost) to predict interobserver disagreement. The dataset was randomly split into training and testing at proportions of 60%/40%, respectively.
Results
Our model demonstrated a mean AUC of 0.86 (standard deviation, SD = 0.09) across five cross-validation runs on the training set, highlighting its predictive reliability. The AUC on the testing set was 0.81 (confidence interval, CI = [0.82–0.94]), emphasizing the model’s precision in predicting whether an unseen WSI would lead to discordance.
Conclusions
Our study presents a machine learning model built to identify potential diagnostic disagreements in HER2 pathology evaluations. The results demonstrate a correlation between the quality of pathology slides and diagnostic outcomes. Upon proper validation, our tool could be integrated among the existing quality assurance systems used in anatomic pathology departments to improve HER2 diagnostic process.
期刊介绍:
Physica Medica, European Journal of Medical Physics, publishing with Elsevier from 2007, provides an international forum for research and reviews on the following main topics:
Medical Imaging
Radiation Therapy
Radiation Protection
Measuring Systems and Signal Processing
Education and training in Medical Physics
Professional issues in Medical Physics.