Lingyu Zhao, Na Zhao, Ruiqi Zhong, Yiru Niu, Ziyi Chang, Peng Su, Zhihui Wang, Lifang Cui, Bei Wang, Huang Chen, Xiaowen Wang, Xiangbing Kong, Baolin Du, Fei Ren, Dingrong Zhong
{"title":"基于大尺度全幻灯片图像的两阶段多实例学习对肺腺癌中九个基因突变的高精度预测。","authors":"Lingyu Zhao, Na Zhao, Ruiqi Zhong, Yiru Niu, Ziyi Chang, Peng Su, Zhihui Wang, Lifang Cui, Bei Wang, Huang Chen, Xiaowen Wang, Xiangbing Kong, Baolin Du, Fei Ren, Dingrong Zhong","doi":"10.1186/s13000-025-01663-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited identification of genetic mutations in clinical settings.</p><p><strong>Methods: </strong>We collected 2,221 slides from 1999 patients diagnosed with lung adenocarcinoma. The data include whole-slide images data as well as information on gene mutations in EGFR, KRAS, ALK, HER2, and other rare genes (ROS1, RET, BRAF, PIK3CA, NRAS), and related clinical information. The self-supervised model DINO and the two-stage multi-instance network GAMIL were employed to accurately identify mutation statuses in 9 genes linked to tumorigenesis and cancer progression. The comparison of model performance involves the utilization of various foundation model (UNI), classification models (CLAM and Inception v3), external datasets (TCGA and other medical institutions), and comparative analysis with human pathologists.</p><p><strong>Results: </strong>Our approach outperforms the CLAM and inception v3 model, achieving AUC values ranging from 0.825 to 0.987 for predicting gene mutations. The AUC value on the external test data set is 0.516-0.843. Furthermore, when comparing EGFR gene mutation prediction between pathologists and the GAMIL model, GAMIL exhibited a significantly higher AUC value of 0.810, exceeding the average AUC value of 0.508 achieved by pathologists.</p><p><strong>Conclusion: </strong>The GAMIL models exhibit outstanding performance in delineating tumor regions in lung adenocarcinoma and in forecasting gene mutations. The utilization of these models presents substantial potential for markedly improving molecular testing efficiency and opening novel pathways for personalized treatment.</p><p><strong>Trial registration: </strong>Not applicable.</p>","PeriodicalId":11237,"journal":{"name":"Diagnostic Pathology","volume":"20 1","pages":"70"},"PeriodicalIF":2.4000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12128265/pdf/","citationCount":"0","resultStr":"{\"title\":\"High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images.\",\"authors\":\"Lingyu Zhao, Na Zhao, Ruiqi Zhong, Yiru Niu, Ziyi Chang, Peng Su, Zhihui Wang, Lifang Cui, Bei Wang, Huang Chen, Xiaowen Wang, Xiangbing Kong, Baolin Du, Fei Ren, Dingrong Zhong\",\"doi\":\"10.1186/s13000-025-01663-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited identification of genetic mutations in clinical settings.</p><p><strong>Methods: </strong>We collected 2,221 slides from 1999 patients diagnosed with lung adenocarcinoma. The data include whole-slide images data as well as information on gene mutations in EGFR, KRAS, ALK, HER2, and other rare genes (ROS1, RET, BRAF, PIK3CA, NRAS), and related clinical information. The self-supervised model DINO and the two-stage multi-instance network GAMIL were employed to accurately identify mutation statuses in 9 genes linked to tumorigenesis and cancer progression. The comparison of model performance involves the utilization of various foundation model (UNI), classification models (CLAM and Inception v3), external datasets (TCGA and other medical institutions), and comparative analysis with human pathologists.</p><p><strong>Results: </strong>Our approach outperforms the CLAM and inception v3 model, achieving AUC values ranging from 0.825 to 0.987 for predicting gene mutations. The AUC value on the external test data set is 0.516-0.843. Furthermore, when comparing EGFR gene mutation prediction between pathologists and the GAMIL model, GAMIL exhibited a significantly higher AUC value of 0.810, exceeding the average AUC value of 0.508 achieved by pathologists.</p><p><strong>Conclusion: </strong>The GAMIL models exhibit outstanding performance in delineating tumor regions in lung adenocarcinoma and in forecasting gene mutations. The utilization of these models presents substantial potential for markedly improving molecular testing efficiency and opening novel pathways for personalized treatment.</p><p><strong>Trial registration: </strong>Not applicable.</p>\",\"PeriodicalId\":11237,\"journal\":{\"name\":\"Diagnostic Pathology\",\"volume\":\"20 1\",\"pages\":\"70\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12128265/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diagnostic Pathology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s13000-025-01663-w\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic Pathology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13000-025-01663-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PATHOLOGY","Score":null,"Total":0}
High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images.
Background: Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited identification of genetic mutations in clinical settings.
Methods: We collected 2,221 slides from 1999 patients diagnosed with lung adenocarcinoma. The data include whole-slide images data as well as information on gene mutations in EGFR, KRAS, ALK, HER2, and other rare genes (ROS1, RET, BRAF, PIK3CA, NRAS), and related clinical information. The self-supervised model DINO and the two-stage multi-instance network GAMIL were employed to accurately identify mutation statuses in 9 genes linked to tumorigenesis and cancer progression. The comparison of model performance involves the utilization of various foundation model (UNI), classification models (CLAM and Inception v3), external datasets (TCGA and other medical institutions), and comparative analysis with human pathologists.
Results: Our approach outperforms the CLAM and inception v3 model, achieving AUC values ranging from 0.825 to 0.987 for predicting gene mutations. The AUC value on the external test data set is 0.516-0.843. Furthermore, when comparing EGFR gene mutation prediction between pathologists and the GAMIL model, GAMIL exhibited a significantly higher AUC value of 0.810, exceeding the average AUC value of 0.508 achieved by pathologists.
Conclusion: The GAMIL models exhibit outstanding performance in delineating tumor regions in lung adenocarcinoma and in forecasting gene mutations. The utilization of these models presents substantial potential for markedly improving molecular testing efficiency and opening novel pathways for personalized treatment.
期刊介绍:
Diagnostic Pathology is an open access, peer-reviewed, online journal that considers research in surgical and clinical pathology, immunology, and biology, with a special focus on cutting-edge approaches in diagnostic pathology and tissue-based therapy. The journal covers all aspects of surgical pathology, including classic diagnostic pathology, prognosis-related diagnosis (tumor stages, prognosis markers, such as MIB-percentage, hormone receptors, etc.), and therapy-related findings. The journal also focuses on the technological aspects of pathology, including molecular biology techniques, morphometry aspects (stereology, DNA analysis, syntactic structure analysis), communication aspects (telecommunication, virtual microscopy, virtual pathology institutions, etc.), and electronic education and quality assurance (for example interactive publication, on-line references with automated updating, etc.).