Cancer detection via one-shot learning: integrating gene expression and genomic mutation analysis.

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-10-06 DOI:10.1186/s12859-025-06257-3

Alessia Petescia, Gerardo Benevento, Anna Falanga, Alessandro Macaro, Delfina Malandrino, Alberto Montefusco, Rosalinda Sorrentino, Rocco Zaccagnino

{"title":"Cancer detection via one-shot learning: integrating gene expression and genomic mutation analysis.","authors":"Alessia Petescia, Gerardo Benevento, Anna Falanga, Alessandro Macaro, Delfina Malandrino, Alberto Montefusco, Rosalinda Sorrentino, Rocco Zaccagnino","doi":"10.1186/s12859-025-06257-3","DOIUrl":null,"url":null,"abstract":"Background: Cancer is a complex disease influenced by numerous concurrent genetic factors that result in diverse tumor microenvironments (TMEs) across different cancer types. Large-scale genomic projects, such as The Cancer Genome Atlas, have underscored the need for molecular classification of cancer to enable more precise therapeutic strategies. Yet, traditional machine learning (ML) approaches currently face several limitations. First, while effective, they predominantly rely on gene expression data and often overlook critical genomic alterations such as copy number alterations, single nucleotide polymorphisms, and other mutational profiles, limiting the scope of biomarker discovery. Most importantly, they are usually limited by the need of large sample sizes.Results: Building on the hypothesis that type-agnostic representations integrating gene expression with genomic mutations can comprehensively characterize TMEs and capture the similarity or dissimilarity between samples of the same or different types, we propose a novel ML-based method for cancer detection using a one-shot learning framework implemented through Siamese Neural Networks. Our method redefines cancer detection as a similarity-based classification task, allowing the model to generalize to unseen cancer types, a critical advantage in genomics where data scarcity and frequent updates pose significant challenges. To enhance interpretability, we introduce a robust explainability technique founded on SHapley Additive exPlanations (SHAP) values, to provide clear insights into the contributions of gene expression and mutational data, enabling a deeper understanding of the key factors driving cancer detection decisions.Conclusions: Our experimental results show that integrating mutational profiles with gene expression data allows for more accurate cancer type detection and reveals significant mutation patterns. These findings indicate that the proposed method has the potential to significantly enhance cancer type detection by leveraging a more comprehensive understanding of TMEs. Beyond merely classifying cancer types, the proposed SHAP-based explainability technique enables the identification and the analysis of key biomarkers relevant for immunotherapy success, thereby addressing limitations of existing approaches.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"239"},"PeriodicalIF":3.3000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06257-3","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Cancer is a complex disease influenced by numerous concurrent genetic factors that result in diverse tumor microenvironments (TMEs) across different cancer types. Large-scale genomic projects, such as The Cancer Genome Atlas, have underscored the need for molecular classification of cancer to enable more precise therapeutic strategies. Yet, traditional machine learning (ML) approaches currently face several limitations. First, while effective, they predominantly rely on gene expression data and often overlook critical genomic alterations such as copy number alterations, single nucleotide polymorphisms, and other mutational profiles, limiting the scope of biomarker discovery. Most importantly, they are usually limited by the need of large sample sizes.

Results: Building on the hypothesis that type-agnostic representations integrating gene expression with genomic mutations can comprehensively characterize TMEs and capture the similarity or dissimilarity between samples of the same or different types, we propose a novel ML-based method for cancer detection using a one-shot learning framework implemented through Siamese Neural Networks. Our method redefines cancer detection as a similarity-based classification task, allowing the model to generalize to unseen cancer types, a critical advantage in genomics where data scarcity and frequent updates pose significant challenges. To enhance interpretability, we introduce a robust explainability technique founded on SHapley Additive exPlanations (SHAP) values, to provide clear insights into the contributions of gene expression and mutational data, enabling a deeper understanding of the key factors driving cancer detection decisions.

Conclusions: Our experimental results show that integrating mutational profiles with gene expression data allows for more accurate cancer type detection and reveals significant mutation patterns. These findings indicate that the proposed method has the potential to significantly enhance cancer type detection by leveraging a more comprehensive understanding of TMEs. Beyond merely classifying cancer types, the proposed SHAP-based explainability technique enables the identification and the analysis of key biomarkers relevant for immunotherapy success, thereby addressing limitations of existing approaches.

查看原文本刊更多论文

通过一次性学习检测癌症：整合基因表达和基因组突变分析。

背景：癌症是一种复杂的疾病，受许多并发遗传因素的影响，导致不同癌症类型的肿瘤微环境（TMEs）不同。大规模的基因组项目，如癌症基因组图谱，强调了对癌症分子分类的需要，以实现更精确的治疗策略。然而，传统的机器学习（ML）方法目前面临着一些限制。首先，虽然有效，但它们主要依赖于基因表达数据，经常忽略关键的基因组改变，如拷贝数改变、单核苷酸多态性和其他突变谱，限制了生物标志物发现的范围。最重要的是，它们通常受到需要大样本量的限制。结果：基于整合基因表达和基因组突变的类型不可知表征可以全面表征TMEs并捕获相同或不同类型样本之间的相似性或差异性的假设，我们提出了一种基于ml的新型癌症检测方法，该方法使用通过Siamese神经网络实现的一次性学习框架。我们的方法将癌症检测重新定义为基于相似性的分类任务，允许模型推广到未见过的癌症类型，这在基因组学中是一个关键优势，其中数据稀缺和频繁更新构成了重大挑战。为了提高可解释性，我们引入了一种基于SHapley加性解释（SHAP）值的强大可解释性技术，以提供对基因表达和突变数据贡献的清晰见解，从而更深入地了解驱动癌症检测决策的关键因素。结论：我们的实验结果表明，将突变谱与基因表达数据相结合可以更准确地检测癌症类型，并揭示重要的突变模式。这些发现表明，通过利用对TMEs更全面的了解，所提出的方法有可能显著提高癌症类型检测。除了仅仅对癌症类型进行分类之外，提出的基于shap的可解释性技术能够识别和分析与免疫治疗成功相关的关键生物标志物，从而解决现有方法的局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.