Challenges in AI-driven multi-omics data analysis for Oncology: Addressing dimensionality, sparsity, transparency and ethical considerations

Q1 Medicine

Informatics in Medicine Unlocked Pub Date : 2025-01-01 DOI:10.1016/j.imu.2025.101679

Maryem Ouhmouk , Shakuntala Baichoo , Mounia Abik

{"title":"Challenges in AI-driven multi-omics data analysis for Oncology: Addressing dimensionality, sparsity, transparency and ethical considerations","authors":"Maryem Ouhmouk , Shakuntala Baichoo , Mounia Abik","doi":"10.1016/j.imu.2025.101679","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial intelligence, particularly deep learning, is becoming increasingly prominent in multi-omics research, especially since traditional statistical models struggle to handle the complexity and high dimensionality of such data. By effectively combining different types of omics data, AI techniques can unveil hidden connections, detect biomarkers, and improve disease prediction through the integration of multi-omics layers and modalities, which can lead to significant advancements in precision medicine. In this review, we gathered published methods of deep learning-based multi-omics integration specialized in oncology since 2020. We concentrated exclusively on studies utilizing cancer omics data mainly sourced from The Cancer Genome Atlas (TCGA) database. As a result, we identified 32 articles that generally fulfilled the criteria. We studied their techniques and their ability to handle challenges in analyzing multi-omics data, particularly regarding missing data, dimensionality, and processing workflows. We also discuss how well these methods consider explainability, interpretability, and ethical aspects in developing solutions that treat private medical and sensitive information.</div><div>From the 32 studies, we can divide deep learning-based multi-omics integration methods into two types: non-generative and generative models. Non-generative approaches, such as feedforward neural networks (FFNs), graph convolutional networks (GCNs), and autoencoders, are designed to extract features and perform classification directly. On the other hand, generative methods such as variational autoencoders (VAEs), generative adversarial networks (GANs), and generative pretrained transformers (GPTs) focus on creating adaptable representations that can be shared across multiple modalities. These methods have advanced the handling of missing data and dimensionality, outperforming traditional approaches. However, most reviewed models remain at the proof-of-concept stage, with limited clinical validation or real-world deployment.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"57 ","pages":"Article 101679"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352914825000681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial intelligence, particularly deep learning, is becoming increasingly prominent in multi-omics research, especially since traditional statistical models struggle to handle the complexity and high dimensionality of such data. By effectively combining different types of omics data, AI techniques can unveil hidden connections, detect biomarkers, and improve disease prediction through the integration of multi-omics layers and modalities, which can lead to significant advancements in precision medicine. In this review, we gathered published methods of deep learning-based multi-omics integration specialized in oncology since 2020. We concentrated exclusively on studies utilizing cancer omics data mainly sourced from The Cancer Genome Atlas (TCGA) database. As a result, we identified 32 articles that generally fulfilled the criteria. We studied their techniques and their ability to handle challenges in analyzing multi-omics data, particularly regarding missing data, dimensionality, and processing workflows. We also discuss how well these methods consider explainability, interpretability, and ethical aspects in developing solutions that treat private medical and sensitive information.

From the 32 studies, we can divide deep learning-based multi-omics integration methods into two types: non-generative and generative models. Non-generative approaches, such as feedforward neural networks (FFNs), graph convolutional networks (GCNs), and autoencoders, are designed to extract features and perform classification directly. On the other hand, generative methods such as variational autoencoders (VAEs), generative adversarial networks (GANs), and generative pretrained transformers (GPTs) focus on creating adaptable representations that can be shared across multiple modalities. These methods have advanced the handling of missing data and dimensionality, outperforming traditional approaches. However, most reviewed models remain at the proof-of-concept stage, with limited clinical validation or real-world deployment.

查看原文本刊更多论文

人工智能驱动的肿瘤学多组学数据分析的挑战：解决维度、稀疏性、透明度和伦理考虑

人工智能，特别是深度学习，在多组学研究中变得越来越突出，特别是因为传统的统计模型难以处理此类数据的复杂性和高维性。通过有效结合不同类型的组学数据，人工智能技术可以通过多组学层和模式的整合，揭示隐藏的联系，检测生物标志物，改善疾病预测，这可能会导致精准医疗的重大进步。在这篇综述中，我们收集了自2020年以来发表的基于深度学习的肿瘤学多组学整合方法。我们专注于利用主要来自癌症基因组图谱（TCGA）数据库的癌症组学数据的研究。结果，我们确定了32篇基本符合标准的文章。我们研究了他们的技术和他们处理多组学数据分析挑战的能力，特别是在缺失数据、维度和处理工作流方面。我们还讨论了这些方法在开发处理私人医疗和敏感信息的解决方案时如何很好地考虑可解释性、可解释性和伦理方面。从这32项研究中，我们可以将基于深度学习的多组学集成方法分为非生成模型和生成模型两类。非生成方法，如前馈神经网络（ffn）、图卷积网络（GCNs）和自动编码器，被设计用于提取特征并直接执行分类。另一方面，生成方法，如变分自编码器（VAEs）、生成对抗网络（GANs）和生成预训练变压器（GPTs）专注于创建可跨多种模式共享的自适应表示。这些方法提高了对缺失数据和维度的处理，优于传统方法。然而，大多数被审查的模型仍处于概念验证阶段，缺乏临床验证或实际应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Informatics in Medicine Unlocked Medicine-Health Informatics

CiteScore

9.50

自引率

0.00%

发文量

282

审稿时长

39 days

期刊介绍： Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.