Lost in a large EEG multiverse? Comparing sampling approaches for representative pipeline selection

IF 2.3 4区医学 Q2 BIOCHEMICAL RESEARCH METHODS

Journal of Neuroscience Methods Pub Date : 2025-09-09 DOI:10.1016/j.jneumeth.2025.110564

Cassie Ann Short , Andrea Hildebrandt , Robin Bosse , Stefan Debener , Metin Özyağcılar , Katharina Paul , Jan Wacker , Daniel Kristanto

{"title":"Lost in a large EEG multiverse? Comparing sampling approaches for representative pipeline selection","authors":"Cassie Ann Short , Andrea Hildebrandt , Robin Bosse , Stefan Debener , Metin Özyağcılar , Katharina Paul , Jan Wacker , Daniel Kristanto","doi":"10.1016/j.jneumeth.2025.110564","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The multiplicity of defensible pipelines for processing and analysing data has been implicated as a core contributor to low replicability, creating uncertainty about the robustness of results to defensible variations. This is exacerbated where many defensible pipelines exist, such as in processing electroencephalography (EEG) signals. In multiverse analyses, equally defensible pipelines are computed and the robustness across pipelines is reported. Computing all pipelines is often infeasible, and researchers rely on sampling approaches, assuming representativeness of the full multiverse. However, different sampling methods may yield different robustness estimates, introducing what we term <em>multiverse sampling uncertainty</em>.</div></div><div><h3>New method</h3><div>We developed an open-source tool to compare pipeline samples on their representativeness of the full multiverse. We computed a 528-pipeline use case multiverse on EEG recordings during an emotion classification task to predict extraversion scores from the Late Positive Potential. We applied three sampling methods (random, stratified, active learning) to sample 26 pipelines (5 %) and evaluated the representativeness of model fit distributions.</div></div><div><h3>Results</h3><div>Our results highlight variability in the representativeness of model fit distributions across samples, with active learning and stratified sampling most closely representing the full multiverse. Replicability of results is reported using cross-validation, and reproducibility is explored across pipeline sample sizes.</div></div><div><h3>Comparison with existing methods</h3><div>Large multiverse analyses in neuroimaging typically rely on sampling, but sampling approaches are not often systematically compared for their representation of the full multiverse.</div></div><div><h3>Conclusions</h3><div>The need for representative pipeline sampling to mitigate bias in large multiverse analyses is discussed.</div></div>","PeriodicalId":16415,"journal":{"name":"Journal of Neuroscience Methods","volume":"424 ","pages":"Article 110564"},"PeriodicalIF":2.3000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Neuroscience Methods","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165027025002080","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

The multiplicity of defensible pipelines for processing and analysing data has been implicated as a core contributor to low replicability, creating uncertainty about the robustness of results to defensible variations. This is exacerbated where many defensible pipelines exist, such as in processing electroencephalography (EEG) signals. In multiverse analyses, equally defensible pipelines are computed and the robustness across pipelines is reported. Computing all pipelines is often infeasible, and researchers rely on sampling approaches, assuming representativeness of the full multiverse. However, different sampling methods may yield different robustness estimates, introducing what we term multiverse sampling uncertainty.

New method

We developed an open-source tool to compare pipeline samples on their representativeness of the full multiverse. We computed a 528-pipeline use case multiverse on EEG recordings during an emotion classification task to predict extraversion scores from the Late Positive Potential. We applied three sampling methods (random, stratified, active learning) to sample 26 pipelines (5 %) and evaluated the representativeness of model fit distributions.

Results

Our results highlight variability in the representativeness of model fit distributions across samples, with active learning and stratified sampling most closely representing the full multiverse. Replicability of results is reported using cross-validation, and reproducibility is explored across pipeline sample sizes.

Comparison with existing methods

Large multiverse analyses in neuroimaging typically rely on sampling, but sampling approaches are not often systematically compared for their representation of the full multiverse.

Conclusions

The need for representative pipeline sampling to mitigate bias in large multiverse analyses is discussed.

查看原文本刊更多论文

迷失在大脑电图多重宇宙中？比较代表性管道选择的抽样方法。

背景：用于处理和分析数据的可辩护管道的多样性已被认为是低可复制性的核心贡献者，对可辩护变化的结果的稳健性产生不确定性。如果存在许多可防御的管道，例如处理脑电图（EEG）信号，这种情况就会加剧。在多元宇宙分析中，计算了等防御管道，并报告了管道间的鲁棒性。计算所有的管道通常是不可行的，研究人员依靠抽样方法，假设整个多元宇宙的代表性。然而，不同的抽样方法可能产生不同的鲁棒性估计，引入我们所说的多元宇宙抽样不确定性。新方法：我们开发了一个开源工具来比较管道样本对完整多元宇宙的代表性。在情绪分类任务中，我们计算了528个管道用例多重宇宙，以预测来自晚正电位的外向性得分。我们采用三种抽样方法（随机、分层、主动学习）对26条管道（5%）进行抽样，并评估模型拟合分布的代表性。结果：我们的研究结果突出了样本中模型拟合分布代表性的可变性，主动学习和分层抽样最能代表整个多元宇宙。使用交叉验证报告结果的可重复性，并探索跨管道样本量的可重复性。与现有方法的比较：神经成像中的大型多元宇宙分析通常依赖于采样，但采样方法通常不会系统地比较它们对整个多元宇宙的代表。结论：讨论了在大型多元宇宙分析中需要代表性管道抽样来减轻偏差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Neuroscience Methods 医学-神经科学

CiteScore

7.10

自引率

3.30%

发文量

226

审稿时长

52 days

期刊介绍： The Journal of Neuroscience Methods publishes papers that describe new methods that are specifically for neuroscience research conducted in invertebrates, vertebrates or in man. Major methodological improvements or important refinements of established neuroscience methods are also considered for publication. The Journal''s Scope includes all aspects of contemporary neuroscience research, including anatomical, behavioural, biochemical, cellular, computational, molecular, invasive and non-invasive imaging, optogenetic, and physiological research investigations.