Alessia Petescia, Gerardo Benevento, Anna Falanga, Alessandro Macaro, Delfina Malandrino, Alberto Montefusco, Rosalinda Sorrentino, Rocco Zaccagnino
{"title":"Cancer detection via one-shot learning: integrating gene expression and genomic mutation analysis.","authors":"Alessia Petescia, Gerardo Benevento, Anna Falanga, Alessandro Macaro, Delfina Malandrino, Alberto Montefusco, Rosalinda Sorrentino, Rocco Zaccagnino","doi":"10.1186/s12859-025-06257-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cancer is a complex disease influenced by numerous concurrent genetic factors that result in diverse tumor microenvironments (TMEs) across different cancer types. Large-scale genomic projects, such as The Cancer Genome Atlas, have underscored the need for molecular classification of cancer to enable more precise therapeutic strategies. Yet, traditional machine learning (ML) approaches currently face several limitations. First, while effective, they predominantly rely on gene expression data and often overlook critical genomic alterations such as copy number alterations, single nucleotide polymorphisms, and other mutational profiles, limiting the scope of biomarker discovery. Most importantly, they are usually limited by the need of large sample sizes.</p><p><strong>Results: </strong>Building on the hypothesis that type-agnostic representations integrating gene expression with genomic mutations can comprehensively characterize TMEs and capture the similarity or dissimilarity between samples of the same or different types, we propose a novel ML-based method for cancer detection using a one-shot learning framework implemented through Siamese Neural Networks. Our method redefines cancer detection as a similarity-based classification task, allowing the model to generalize to unseen cancer types, a critical advantage in genomics where data scarcity and frequent updates pose significant challenges. To enhance interpretability, we introduce a robust explainability technique founded on SHapley Additive exPlanations (SHAP) values, to provide clear insights into the contributions of gene expression and mutational data, enabling a deeper understanding of the key factors driving cancer detection decisions.</p><p><strong>Conclusions: </strong>Our experimental results show that integrating mutational profiles with gene expression data allows for more accurate cancer type detection and reveals significant mutation patterns. These findings indicate that the proposed method has the potential to significantly enhance cancer type detection by leveraging a more comprehensive understanding of TMEs. Beyond merely classifying cancer types, the proposed SHAP-based explainability technique enables the identification and the analysis of key biomarkers relevant for immunotherapy success, thereby addressing limitations of existing approaches.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"239"},"PeriodicalIF":3.3000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06257-3","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Cancer is a complex disease influenced by numerous concurrent genetic factors that result in diverse tumor microenvironments (TMEs) across different cancer types. Large-scale genomic projects, such as The Cancer Genome Atlas, have underscored the need for molecular classification of cancer to enable more precise therapeutic strategies. Yet, traditional machine learning (ML) approaches currently face several limitations. First, while effective, they predominantly rely on gene expression data and often overlook critical genomic alterations such as copy number alterations, single nucleotide polymorphisms, and other mutational profiles, limiting the scope of biomarker discovery. Most importantly, they are usually limited by the need of large sample sizes.
Results: Building on the hypothesis that type-agnostic representations integrating gene expression with genomic mutations can comprehensively characterize TMEs and capture the similarity or dissimilarity between samples of the same or different types, we propose a novel ML-based method for cancer detection using a one-shot learning framework implemented through Siamese Neural Networks. Our method redefines cancer detection as a similarity-based classification task, allowing the model to generalize to unseen cancer types, a critical advantage in genomics where data scarcity and frequent updates pose significant challenges. To enhance interpretability, we introduce a robust explainability technique founded on SHapley Additive exPlanations (SHAP) values, to provide clear insights into the contributions of gene expression and mutational data, enabling a deeper understanding of the key factors driving cancer detection decisions.
Conclusions: Our experimental results show that integrating mutational profiles with gene expression data allows for more accurate cancer type detection and reveals significant mutation patterns. These findings indicate that the proposed method has the potential to significantly enhance cancer type detection by leveraging a more comprehensive understanding of TMEs. Beyond merely classifying cancer types, the proposed SHAP-based explainability technique enables the identification and the analysis of key biomarkers relevant for immunotherapy success, thereby addressing limitations of existing approaches.
期刊介绍:
BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology.
BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.