The impact of cross-validation choices on pBCI classification metrics: lessons for transparent reporting.

IF 1.9 Q3 ERGONOMICS

Frontiers in neuroergonomics Pub Date : 2025-07-01 eCollection Date: 2025-01-01 DOI:10.3389/fnrgo.2025.1582724

Felix Schroeder, Stephen Fairclough, Frederic Dehais, Matthew Richins

{"title":"The impact of cross-validation choices on pBCI classification metrics: lessons for transparent reporting.","authors":"Felix Schroeder, Stephen Fairclough, Frederic Dehais, Matthew Richins","doi":"10.3389/fnrgo.2025.1582724","DOIUrl":null,"url":null,"abstract":"<p><p>Neuroadaptive technologies are a type of passive Brain-computer interface (pBCI) that aim to incorporate implicit user-state information into human-machine interactions by monitoring neurophysiological signals. Evaluating machine learning and signal processing approaches represents a core aspect of research into neuroadaptive technologies. These evaluations are often conducted under controlled laboratory settings and offline, where exhaustive analyses are possible. However, the manner in which classifiers are evaluated offline has been shown to impact reported accuracy levels, possibly biasing conclusions. In the current study, we investigated one of these sources of bias, the choice of cross-validation scheme, which is often not reported in sufficient detail. Across three independent electroencephalography (EEG) n-back datasets and 74 participants, we show how metrics and conclusions based on the same data can diverge with different cross-validation choices. A comparison of cross-validation schemes in which train and test subset boundaries either respect the block-structure of the data collection or not, illustrated how the relative performance of classifiers varies significantly with the evaluation method used. By computing bootstrapped 95% confidence intervals of differences across datasets, we showed that classification accuracies of Riemannian minimum distance (RMDM) classifiers may differ by up to 12.7% while those of a Filter Bank Common Spatial Pattern (FBCSP) based linear discriminant analysis (LDA) may differ by up to 30.4%. These differences across cross-validation implementations may impact the conclusions presented in research papers, which can complicate efforts to foster reproducibility. Our results exemplify why detailed reporting on data splitting procedures should become common practice.</p>","PeriodicalId":517413,"journal":{"name":"Frontiers in neuroergonomics","volume":"6 ","pages":"1582724"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12259573/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in neuroergonomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fnrgo.2025.1582724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"ERGONOMICS","Score":null,"Total":0}

引用次数: 0

Abstract

Neuroadaptive technologies are a type of passive Brain-computer interface (pBCI) that aim to incorporate implicit user-state information into human-machine interactions by monitoring neurophysiological signals. Evaluating machine learning and signal processing approaches represents a core aspect of research into neuroadaptive technologies. These evaluations are often conducted under controlled laboratory settings and offline, where exhaustive analyses are possible. However, the manner in which classifiers are evaluated offline has been shown to impact reported accuracy levels, possibly biasing conclusions. In the current study, we investigated one of these sources of bias, the choice of cross-validation scheme, which is often not reported in sufficient detail. Across three independent electroencephalography (EEG) n-back datasets and 74 participants, we show how metrics and conclusions based on the same data can diverge with different cross-validation choices. A comparison of cross-validation schemes in which train and test subset boundaries either respect the block-structure of the data collection or not, illustrated how the relative performance of classifiers varies significantly with the evaluation method used. By computing bootstrapped 95% confidence intervals of differences across datasets, we showed that classification accuracies of Riemannian minimum distance (RMDM) classifiers may differ by up to 12.7% while those of a Filter Bank Common Spatial Pattern (FBCSP) based linear discriminant analysis (LDA) may differ by up to 30.4%. These differences across cross-validation implementations may impact the conclusions presented in research papers, which can complicate efforts to foster reproducibility. Our results exemplify why detailed reporting on data splitting procedures should become common practice.

查看原文本刊更多论文

交叉验证选择对pBCI分类度量的影响：透明报告的经验教训。

神经自适应技术是一种被动脑机接口（pBCI），旨在通过监测神经生理信号将隐含的用户状态信息纳入人机交互。评估机器学习和信号处理方法是神经适应技术研究的一个核心方面。这些评估通常是在受控的实验室环境和离线环境下进行的，在这种情况下可以进行详尽的分析。然而，离线评估分类器的方式已被证明会影响报告的准确性水平，可能会使结论产生偏差。在目前的研究中，我们调查了其中一个偏倚来源，即交叉验证方案的选择，这通常没有得到足够详细的报道。通过三个独立的脑电图（EEG） n-back数据集和74名参与者，我们展示了基于相同数据的指标和结论如何因不同的交叉验证选择而产生分歧。交叉验证方案的比较，其中训练和测试子集边界尊重或不尊重数据集的块结构，说明了分类器的相对性能如何随着所使用的评估方法而显着变化。通过计算数据集之间差异的自举95%置信区间，我们发现黎曼最小距离（RMDM）分类器的分类精度可能相差12.7%，而基于滤波器组公共空间模式（FBCSP）的线性判别分析（LDA）的分类精度可能相差30.4%。这些跨交叉验证实现的差异可能会影响研究论文中提出的结论，这可能会使促进可重复性的努力复杂化。我们的结果举例说明了为什么详细报告数据拆分过程应该成为一种常见做法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in neuroergonomics

自引率

0.00%

发文量