Development and clinical validation of deep learning-based immunohistochemistry prediction models for subtyping and staging of gastrointestinal cancers.

IF 2.5 3区 医学 Q2 GASTROENTEROLOGY & HEPATOLOGY
Junxiao Wang, Shiying Zhang, Jia Li, Mei Deng, Zhi Zeng, Zehua Dong, Fangfang Chen, Wen Liu, Lianlian Wu, Honggang Yu
{"title":"Development and clinical validation of deep learning-based immunohistochemistry prediction models for subtyping and staging of gastrointestinal cancers.","authors":"Junxiao Wang, Shiying Zhang, Jia Li, Mei Deng, Zhi Zeng, Zehua Dong, Fangfang Chen, Wen Liu, Lianlian Wu, Honggang Yu","doi":"10.1186/s12876-025-04045-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Immunohistochemistry (IHC) is a critical tool for tumor diagnosis and treatment, but it is time and tissue consuming, and highly dependent on skilled laboratory technicians. Recently, deep learning-based IHC biomarker prediction models have been widely developed, but few investigations have explored their clinical application effectiveness.</p><p><strong>Methods: </strong>In this study, we aimed to create an automatic pipeline for the construction of deep learning models to generate AI-IHC (Artificial Intelligence) output using H&E whole slide images (WSIs) and compared the pathology reports by pathologists on AI-IHC versus conventional IHC. We obtained 134 WSIs including H&E and IHC pairs, and automatically extracted 415,463 tiles from H&E slides for model construction based on the annotation transfer from IHC slides. Five IHC biomarker prediction models (P40, Pan-CK, Desmin, P53, Ki-67) were developed to support a range of clinically relevant diagnostic applications across various gastrointestinal cancer subtypes, including esophageal, gastric, and colorectal cancers. The Ki-67 proliferation index was quantitatively assessed using digital image analysis.</p><p><strong>Results: </strong>The AUCs of five IHC biomarker models ranged from 0.90 to 0.96 and the accuracies were between 83.04 and 90.81%. Additional 150 WSIs from 30 patients were collected to assess the effectiveness of AI-IHC through the multi-reader multi-case (MRMC) study. Each case was read by three pathologists, once on AI-IHC and once on conventional IHC with a minimum 2-week washout period. The results indicate that the consistency rates of pathologists in AI and conventional IHC cases were high in Desmin, Pan-CK and P40 (96.67-100%) while moderate in the P53 (70.00%). We also evaluated the T-stage through the staining of these IHC biomarkers and the consistency rate was 86.36%. Furthermore, the Ki-67 proliferation index, as reported by AI-IHC, showed a variability ranging from 17.35% ±16.2% compared to conventional IHC, with ICC of 0.415 (P = 0.015) between these two groups.</p><p><strong>Conclusions: </strong>Here, we leveraged automatic tile-level annotations from H&E slides to efficiently develop deep learning-based IHC biomarker models, achieving AUCs between 0.90 and 0.96. AI generated IHC showed substantial concordance with conventional IHC across most markers, supporting its potential as an assistive tool in routine diagnostics.</p>","PeriodicalId":9129,"journal":{"name":"BMC Gastroenterology","volume":"25 1","pages":"494"},"PeriodicalIF":2.5000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12211442/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Gastroenterology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12876-025-04045-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Immunohistochemistry (IHC) is a critical tool for tumor diagnosis and treatment, but it is time and tissue consuming, and highly dependent on skilled laboratory technicians. Recently, deep learning-based IHC biomarker prediction models have been widely developed, but few investigations have explored their clinical application effectiveness.

Methods: In this study, we aimed to create an automatic pipeline for the construction of deep learning models to generate AI-IHC (Artificial Intelligence) output using H&E whole slide images (WSIs) and compared the pathology reports by pathologists on AI-IHC versus conventional IHC. We obtained 134 WSIs including H&E and IHC pairs, and automatically extracted 415,463 tiles from H&E slides for model construction based on the annotation transfer from IHC slides. Five IHC biomarker prediction models (P40, Pan-CK, Desmin, P53, Ki-67) were developed to support a range of clinically relevant diagnostic applications across various gastrointestinal cancer subtypes, including esophageal, gastric, and colorectal cancers. The Ki-67 proliferation index was quantitatively assessed using digital image analysis.

Results: The AUCs of five IHC biomarker models ranged from 0.90 to 0.96 and the accuracies were between 83.04 and 90.81%. Additional 150 WSIs from 30 patients were collected to assess the effectiveness of AI-IHC through the multi-reader multi-case (MRMC) study. Each case was read by three pathologists, once on AI-IHC and once on conventional IHC with a minimum 2-week washout period. The results indicate that the consistency rates of pathologists in AI and conventional IHC cases were high in Desmin, Pan-CK and P40 (96.67-100%) while moderate in the P53 (70.00%). We also evaluated the T-stage through the staining of these IHC biomarkers and the consistency rate was 86.36%. Furthermore, the Ki-67 proliferation index, as reported by AI-IHC, showed a variability ranging from 17.35% ±16.2% compared to conventional IHC, with ICC of 0.415 (P = 0.015) between these two groups.

Conclusions: Here, we leveraged automatic tile-level annotations from H&E slides to efficiently develop deep learning-based IHC biomarker models, achieving AUCs between 0.90 and 0.96. AI generated IHC showed substantial concordance with conventional IHC across most markers, supporting its potential as an assistive tool in routine diagnostics.

基于深度学习的胃肠道癌症亚型和分期免疫组织化学预测模型的开发和临床验证。
背景:免疫组织化学(IHC)是肿瘤诊断和治疗的重要工具,但它耗时耗组织,高度依赖熟练的实验室技术人员。近年来,基于深度学习的IHC生物标志物预测模型得到了广泛的发展,但很少有研究探讨其临床应用效果。方法:在本研究中,我们旨在创建一个用于构建深度学习模型的自动管道,以使用H&E全幻灯片图像(WSIs)生成AI-IHC(人工智能)输出,并比较病理学家在AI-IHC和传统IHC上的病理报告。我们获得了134个wsi,包括H&E和IHC对,并基于IHC玻片的注释转移,从H&E玻片中自动提取了415,463个瓦片用于模型构建。开发了5种IHC生物标志物预测模型(P40、Pan-CK、Desmin、P53、Ki-67),以支持各种胃肠道癌症亚型(包括食管癌、胃癌和结直肠癌)的一系列临床相关诊断应用。采用数字图像分析定量评价Ki-67增殖指数。结果:5种IHC生物标志物模型的auc范围为0.90 ~ 0.96,准确率为83.04 ~ 90.81%。另外从30例患者中收集150例wsi,通过多读者多病例(MRMC)研究评估AI-IHC的有效性。每个病例由三名病理学家阅读,一次进行人工免疫组化,一次进行常规免疫组化,至少2周洗脱期。结果显示:AI与常规免疫组化病例病理学家Desmin、Pan-CK、P40符合率高(96.67 ~ 100%),P53符合率中等(70.00%)。我们还通过这些免疫组化生物标志物的染色来评估t期,一致性为86.36%。此外,AI-IHC报告的Ki-67增殖指数与常规IHC相比,差异范围为17.35%±16.2%,两者之间的ICC为0.415 (P = 0.015)。在这里,我们利用H&E幻灯片的自动贴片级别注释来有效地开发基于深度学习的IHC生物标志物模型,实现了0.90到0.96之间的auc。人工智能生成的免疫组化在大多数标记物上与传统免疫组化显示出实质性的一致性,支持其作为常规诊断辅助工具的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Gastroenterology
BMC Gastroenterology 医学-胃肠肝病学
CiteScore
4.20
自引率
0.00%
发文量
465
审稿时长
6 months
期刊介绍: BMC Gastroenterology is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of gastrointestinal and hepatobiliary disorders, as well as related molecular genetics, pathophysiology, and epidemiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信