基于大尺度全幻灯片图像的两阶段多实例学习对肺腺癌中九个基因突变的高精度预测。

IF 2.3 3区医学 Q2 PATHOLOGY

Diagnostic Pathology Pub Date : 2025-06-02 DOI:10.1186/s13000-025-01663-w

Lingyu Zhao, Na Zhao, Ruiqi Zhong, Yiru Niu, Ziyi Chang, Peng Su, Zhihui Wang, Lifang Cui, Bei Wang, Huang Chen, Xiaowen Wang, Xiangbing Kong, Baolin Du, Fei Ren, Dingrong Zhong

{"title":"基于大尺度全幻灯片图像的两阶段多实例学习对肺腺癌中九个基因突变的高精度预测。","authors":"Lingyu Zhao, Na Zhao, Ruiqi Zhong, Yiru Niu, Ziyi Chang, Peng Su, Zhihui Wang, Lifang Cui, Bei Wang, Huang Chen, Xiaowen Wang, Xiangbing Kong, Baolin Du, Fei Ren, Dingrong Zhong","doi":"10.1186/s13000-025-01663-w","DOIUrl":null,"url":null,"abstract":"Background: Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited identification of genetic mutations in clinical settings.Methods: We collected 2,221 slides from 1999 patients diagnosed with lung adenocarcinoma. The data include whole-slide images data as well as information on gene mutations in EGFR, KRAS, ALK, HER2, and other rare genes (ROS1, RET, BRAF, PIK3CA, NRAS), and related clinical information. The self-supervised model DINO and the two-stage multi-instance network GAMIL were employed to accurately identify mutation statuses in 9 genes linked to tumorigenesis and cancer progression. The comparison of model performance involves the utilization of various foundation model (UNI), classification models (CLAM and Inception v3), external datasets (TCGA and other medical institutions), and comparative analysis with human pathologists.Results: Our approach outperforms the CLAM and inception v3 model, achieving AUC values ranging from 0.825 to 0.987 for predicting gene mutations. The AUC value on the external test data set is 0.516-0.843. Furthermore, when comparing EGFR gene mutation prediction between pathologists and the GAMIL model, GAMIL exhibited a significantly higher AUC value of 0.810, exceeding the average AUC value of 0.508 achieved by pathologists.Conclusion: The GAMIL models exhibit outstanding performance in delineating tumor regions in lung adenocarcinoma and in forecasting gene mutations. The utilization of these models presents substantial potential for markedly improving molecular testing efficiency and opening novel pathways for personalized treatment.Trial registration: Not applicable.","PeriodicalId":11237,"journal":{"name":"Diagnostic Pathology","volume":"20 1","pages":"70"},"PeriodicalIF":2.3000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12128265/pdf/","citationCount":"0","resultStr":"{\"title\":\"High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images.\",\"authors\":\"Lingyu Zhao, Na Zhao, Ruiqi Zhong, Yiru Niu, Ziyi Chang, Peng Su, Zhihui Wang, Lifang Cui, Bei Wang, Huang Chen, Xiaowen Wang, Xiangbing Kong, Baolin Du, Fei Ren, Dingrong Zhong\",\"doi\":\"10.1186/s13000-025-01663-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited identification of genetic mutations in clinical settings.Methods: We collected 2,221 slides from 1999 patients diagnosed with lung adenocarcinoma. The data include whole-slide images data as well as information on gene mutations in EGFR, KRAS, ALK, HER2, and other rare genes (ROS1, RET, BRAF, PIK3CA, NRAS), and related clinical information. The self-supervised model DINO and the two-stage multi-instance network GAMIL were employed to accurately identify mutation statuses in 9 genes linked to tumorigenesis and cancer progression. The comparison of model performance involves the utilization of various foundation model (UNI), classification models (CLAM and Inception v3), external datasets (TCGA and other medical institutions), and comparative analysis with human pathologists.Results: Our approach outperforms the CLAM and inception v3 model, achieving AUC values ranging from 0.825 to 0.987 for predicting gene mutations. The AUC value on the external test data set is 0.516-0.843. Furthermore, when comparing EGFR gene mutation prediction between pathologists and the GAMIL model, GAMIL exhibited a significantly higher AUC value of 0.810, exceeding the average AUC value of 0.508 achieved by pathologists.Conclusion: The GAMIL models exhibit outstanding performance in delineating tumor regions in lung adenocarcinoma and in forecasting gene mutations. The utilization of these models presents substantial potential for markedly improving molecular testing efficiency and opening novel pathways for personalized treatment.Trial registration: Not applicable.\",\"PeriodicalId\":11237,\"journal\":{\"name\":\"Diagnostic Pathology\",\"volume\":\"20 1\",\"pages\":\"70\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12128265/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diagnostic Pathology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s13000-025-01663-w\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic Pathology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13000-025-01663-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PATHOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：肺癌是一种普遍存在的恶性肿瘤。传统的基因检测方法面临成本高、程序长等限制。通过组织病理学图像预测临床相关的基因突变可以促进临床环境中基因突变的快速识别。方法：收集1999例确诊为肺腺癌患者的2221张载玻片。数据包括全片图像数据以及EGFR、KRAS、ALK、HER2和其他罕见基因（ROS1、RET、BRAF、PIK3CA、NRAS）的基因突变信息和相关临床信息。采用自监督模型DINO和两阶段多实例网络GAMIL来准确识别与肿瘤发生和癌症进展相关的9个基因的突变状态。模型性能的比较包括对各种基础模型（UNI）、分类模型（CLAM和Inception v3）、外部数据集（TCGA等医疗机构）的利用，以及与人类病理学家的对比分析。结果：我们的方法优于CLAM和inception v3模型，预测基因突变的AUC值在0.825到0.987之间。外部测试数据集上的AUC值为0.516-0.843。此外，在比较病理学家与GAMIL模型的EGFR基因突变预测时，GAMIL模型的AUC值为0.810，明显高于病理学家的平均AUC值0.508。结论：GAMIL模型在肺腺癌的肿瘤区域划分和基因突变预测方面表现突出。这些模型的利用为显著提高分子检测效率和开辟个性化治疗的新途径提供了巨大的潜力。试验注册：不适用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images.

Background: Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited identification of genetic mutations in clinical settings.

Methods: We collected 2,221 slides from 1999 patients diagnosed with lung adenocarcinoma. The data include whole-slide images data as well as information on gene mutations in EGFR, KRAS, ALK, HER2, and other rare genes (ROS1, RET, BRAF, PIK3CA, NRAS), and related clinical information. The self-supervised model DINO and the two-stage multi-instance network GAMIL were employed to accurately identify mutation statuses in 9 genes linked to tumorigenesis and cancer progression. The comparison of model performance involves the utilization of various foundation model (UNI), classification models (CLAM and Inception v3), external datasets (TCGA and other medical institutions), and comparative analysis with human pathologists.

Results: Our approach outperforms the CLAM and inception v3 model, achieving AUC values ranging from 0.825 to 0.987 for predicting gene mutations. The AUC value on the external test data set is 0.516-0.843. Furthermore, when comparing EGFR gene mutation prediction between pathologists and the GAMIL model, GAMIL exhibited a significantly higher AUC value of 0.810, exceeding the average AUC value of 0.508 achieved by pathologists.

Conclusion: The GAMIL models exhibit outstanding performance in delineating tumor regions in lung adenocarcinoma and in forecasting gene mutations. The utilization of these models presents substantial potential for markedly improving molecular testing efficiency and opening novel pathways for personalized treatment.

Trial registration: Not applicable.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Diagnostic Pathology 医学-病理学

CiteScore

4.60

自引率

0.00%

发文量

审稿时长

1 months

期刊介绍： Diagnostic Pathology is an open access, peer-reviewed, online journal that considers research in surgical and clinical pathology, immunology, and biology, with a special focus on cutting-edge approaches in diagnostic pathology and tissue-based therapy. The journal covers all aspects of surgical pathology, including classic diagnostic pathology, prognosis-related diagnosis (tumor stages, prognosis markers, such as MIB-percentage, hormone receptors, etc.), and therapy-related findings. The journal also focuses on the technological aspects of pathology, including molecular biology techniques, morphometry aspects (stereology, DNA analysis, syntactic structure analysis), communication aspects (telecommunication, virtual microscopy, virtual pathology institutions, etc.), and electronic education and quality assurance (for example interactive publication, on-line references with automated updating, etc.).