Evaluating machine learning pipelines for multimodal neuroimaging in small cohorts: an ALS case study.

IF 2.5 4区医学 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in Neuroinformatics Pub Date : 2025-06-13 eCollection Date: 2025-01-01 DOI:10.3389/fninf.2025.1568116

Shailesh Appukuttan, Aude-Marie Grapperon, Mounir Mohamed El Mendili, Hugo Dary, Maxime Guye, Annie Verschueren, Jean-Philippe Ranjeva, Shahram Attarian, Wafaa Zaaraoui, Matthieu Gilson

{"title":"Evaluating machine learning pipelines for multimodal neuroimaging in small cohorts: an ALS case study.","authors":"Shailesh Appukuttan, Aude-Marie Grapperon, Mounir Mohamed El Mendili, Hugo Dary, Maxime Guye, Annie Verschueren, Jean-Philippe Ranjeva, Shahram Attarian, Wafaa Zaaraoui, Matthieu Gilson","doi":"10.3389/fninf.2025.1568116","DOIUrl":null,"url":null,"abstract":"<p><p>Advancements in machine learning hold great promise for the analysis of multimodal neuroimaging data. They can help identify biomarkers and improve diagnosis for various neurological disorders. However, the application of such techniques for rare and heterogeneous diseases remains challenging due to small-cohorts available for acquiring data. Efforts are therefore commonly directed toward improving the classification models, in an effort to optimize outcomes given the limited data. In this study, we systematically evaluated the impact of various machine learning pipeline configurations, including scaling methods, feature selection, dimensionality reduction, and hyperparameter optimization. The efficacy of such components in the pipeline was evaluated on classification performance using multimodal MRI data from a cohort of 16 ALS patients and 14 healthy controls. Our findings reveal that, while certain pipeline components, such as subject-wise feature normalization, help improve classification outcomes, the overall influence of pipeline refinements on performance is modest. Feature selection and dimensionality reduction steps were found to have limited utility, and the choice of hyperparameter optimization strategies produced only marginal gains. Our results suggest that, for small-cohort studies, the emphasis should shift from extensive tuning of these pipelines to addressing data-related limitations, such as progressively expanding cohort size, integrating additional modalities, and maximizing the information extracted from existing datasets. This study provides a methodological framework to guide future research and emphasizes the need for dataset enrichment to improve clinical utility.</p>","PeriodicalId":12462,"journal":{"name":"Frontiers in Neuroinformatics","volume":"19 ","pages":"1568116"},"PeriodicalIF":2.5000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12202540/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Neuroinformatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fninf.2025.1568116","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Advancements in machine learning hold great promise for the analysis of multimodal neuroimaging data. They can help identify biomarkers and improve diagnosis for various neurological disorders. However, the application of such techniques for rare and heterogeneous diseases remains challenging due to small-cohorts available for acquiring data. Efforts are therefore commonly directed toward improving the classification models, in an effort to optimize outcomes given the limited data. In this study, we systematically evaluated the impact of various machine learning pipeline configurations, including scaling methods, feature selection, dimensionality reduction, and hyperparameter optimization. The efficacy of such components in the pipeline was evaluated on classification performance using multimodal MRI data from a cohort of 16 ALS patients and 14 healthy controls. Our findings reveal that, while certain pipeline components, such as subject-wise feature normalization, help improve classification outcomes, the overall influence of pipeline refinements on performance is modest. Feature selection and dimensionality reduction steps were found to have limited utility, and the choice of hyperparameter optimization strategies produced only marginal gains. Our results suggest that, for small-cohort studies, the emphasis should shift from extensive tuning of these pipelines to addressing data-related limitations, such as progressively expanding cohort size, integrating additional modalities, and maximizing the information extracted from existing datasets. This study provides a methodological framework to guide future research and emphasizes the need for dataset enrichment to improve clinical utility.

Abstract Image

查看原文本刊更多论文

在小队列中评估多模态神经成像的机器学习管道：ALS案例研究。

机器学习的进步为多模态神经成像数据的分析带来了巨大的希望。它们可以帮助识别生物标志物，提高对各种神经系统疾病的诊断。然而，由于可用于获取数据的队列较少，将此类技术应用于罕见和异质性疾病仍然具有挑战性。因此，努力通常指向改进分类模型，努力在有限的数据下优化结果。在这项研究中，我们系统地评估了各种机器学习管道配置的影响，包括缩放方法、特征选择、降维和超参数优化。使用来自16名ALS患者和14名健康对照者的多模态MRI数据，对管道中这些成分的功效进行了分类性能评估。我们的研究结果表明，虽然某些管道组件（如主题特征归一化）有助于提高分类结果，但管道改进对性能的总体影响是适度的。发现特征选择和降维步骤的效用有限，超参数优化策略的选择只产生边际收益。我们的研究结果表明，对于小队列研究，重点应从广泛调整这些管道转向解决与数据相关的限制，例如逐步扩大队列规模，整合其他模式，并最大限度地从现有数据集中提取信息。该研究为指导未来的研究提供了一个方法学框架，并强调需要丰富数据集以提高临床实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Neuroinformatics MATHEMATICAL & COMPUTATIONAL BIOLOGY-NEUROSCIENCES

CiteScore

4.80

自引率

5.70%

发文量

132

审稿时长

14 weeks

期刊介绍： Frontiers in Neuroinformatics publishes rigorously peer-reviewed research on the development and implementation of numerical/computational models and analytical tools used to share, integrate and analyze experimental data and advance theories of the nervous system functions. Specialty Chief Editors Jan G. Bjaalie at the University of Oslo and Sean L. Hill at the École Polytechnique Fédérale de Lausanne are supported by an outstanding Editorial Board of international experts. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics and the public worldwide. Neuroscience is being propelled into the information age as the volume of information explodes, demanding organization and synthesis. Novel synthesis approaches are opening up a new dimension for the exploration of the components of brain elements and systems and the vast number of variables that underlie their functions. Neural data is highly heterogeneous with complex inter-relations across multiple levels, driving the need for innovative organizing and synthesizing approaches from genes to cognition, and covering a range of species and disease states. Frontiers in Neuroinformatics therefore welcomes submissions on existing neuroscience databases, development of data and knowledge bases for all levels of neuroscience, applications and technologies that can facilitate data sharing (interoperability, formats, terminologies, and ontologies), and novel tools for data acquisition, analyses, visualization, and dissemination of nervous system data. Our journal welcomes submissions on new tools (software and hardware) that support brain modeling, and the merging of neuroscience databases with brain models used for simulation and visualization.