用于小样本分类的高维多模态生物医学数据的统一贝叶斯表示

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-08-23 DOI:10.1016/j.engappai.2025.111887

Albert Belenguer-Llorens , Carlos Sevilla-Salcedo , Jussi Tohka , Vanessa Gómez-Verdejo , Alzheimer’s Disease Neuroimaging Initiative

{"title":"用于小样本分类的高维多模态生物医学数据的统一贝叶斯表示","authors":"Albert Belenguer-Llorens , Carlos Sevilla-Salcedo , Jussi Tohka , Vanessa Gómez-Verdejo , Alzheimer’s Disease Neuroimaging Initiative","doi":"10.1016/j.engappai.2025.111887","DOIUrl":null,"url":null,"abstract":"<div><div>The increasing availability of multi-modal medical data, including neuroimaging, genetic profiles, and clinical measurements, offers unprecedented opportunities for advancing disease diagnosis and prognosis. However, integrating these heterogeneous data sources poses significant challenges due to their high dimensionality, redundancy, and small sample sizes, which hinder the effectiveness of traditional machine learning models.</div><div>To overcome these challenges, we present the BAyesian Latent Data Unified Representation model (BALDUR), a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions. To do so, the proposed model combines within a common latent space the different data views to extract the relevant information to solve the classification task and prune out the irrelevant/redundant features/data views. Furthermore, to provide generalizable solutions in small sample size scenarios, BALDUR efficiently integrates dual kernels over the views with a small sample-to-feature ratio. Finally, its linear nature ensures the explainability of the model outcomes, allowing its use for biomarker identification. This model was tested over two different neurodegeneration datasets, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"160 ","pages":"Article 111887"},"PeriodicalIF":8.0000,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification\",\"authors\":\"Albert Belenguer-Llorens , Carlos Sevilla-Salcedo , Jussi Tohka , Vanessa Gómez-Verdejo , Alzheimer’s Disease Neuroimaging Initiative\",\"doi\":\"10.1016/j.engappai.2025.111887\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The increasing availability of multi-modal medical data, including neuroimaging, genetic profiles, and clinical measurements, offers unprecedented opportunities for advancing disease diagnosis and prognosis. However, integrating these heterogeneous data sources poses significant challenges due to their high dimensionality, redundancy, and small sample sizes, which hinder the effectiveness of traditional machine learning models.</div><div>To overcome these challenges, we present the BAyesian Latent Data Unified Representation model (BALDUR), a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions. To do so, the proposed model combines within a common latent space the different data views to extract the relevant information to solve the classification task and prune out the irrelevant/redundant features/data views. Furthermore, to provide generalizable solutions in small sample size scenarios, BALDUR efficiently integrates dual kernels over the views with a small sample-to-feature ratio. Finally, its linear nature ensures the explainability of the model outcomes, allowing its use for biomarker identification. This model was tested over two different neurodegeneration datasets, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"160 \",\"pages\":\"Article 111887\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625018895\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625018895","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

越来越多的多模式医学数据，包括神经成像、遗传谱和临床测量，为推进疾病诊断和预后提供了前所未有的机会。然而，由于这些异构数据源的高维、冗余和小样本量，集成这些数据源带来了巨大的挑战，这阻碍了传统机器学习模型的有效性。为了克服这些挑战，我们提出了贝叶斯潜在数据统一表示模型（BALDUR），这是一种新的贝叶斯算法，旨在处理高维环境下的多模态数据集和小样本量，同时提供可解释的解决方案。为此，提出的模型将不同的数据视图组合在一个共同的潜在空间内，提取相关信息来解决分类任务，并剔除不相关/冗余的特征/数据视图。此外，为了在小样本场景中提供一般化的解决方案，BALDUR有效地在视图上集成了双内核，样本特征比很小。最后，其线性性质确保了模型结果的可解释性，允许其用于生物标志物鉴定。该模型在两种不同的神经变性数据集上进行了测试，优于最先进的模型，并检测出与科学文献中已经描述的标记相一致的特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification

The increasing availability of multi-modal medical data, including neuroimaging, genetic profiles, and clinical measurements, offers unprecedented opportunities for advancing disease diagnosis and prognosis. However, integrating these heterogeneous data sources poses significant challenges due to their high dimensionality, redundancy, and small sample sizes, which hinder the effectiveness of traditional machine learning models.

To overcome these challenges, we present the BAyesian Latent Data Unified Representation model (BALDUR), a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions. To do so, the proposed model combines within a common latent space the different data views to extract the relevant information to solve the classification task and prune out the irrelevant/redundant features/data views. Furthermore, to provide generalizable solutions in small sample size scenarios, BALDUR efficiently integrates dual kernels over the views with a small sample-to-feature ratio. Finally, its linear nature ensures the explainability of the model outcomes, allowing its use for biomarker identification. This model was tested over two different neurodegeneration datasets, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.