The impact of cross-validation choices on pBCI classification metrics: lessons for transparent reporting.

IF 1.9 Q3 ERGONOMICS
Frontiers in neuroergonomics Pub Date : 2025-07-01 eCollection Date: 2025-01-01 DOI:10.3389/fnrgo.2025.1582724
Felix Schroeder, Stephen Fairclough, Frederic Dehais, Matthew Richins
{"title":"The impact of cross-validation choices on pBCI classification metrics: lessons for transparent reporting.","authors":"Felix Schroeder, Stephen Fairclough, Frederic Dehais, Matthew Richins","doi":"10.3389/fnrgo.2025.1582724","DOIUrl":null,"url":null,"abstract":"<p><p>Neuroadaptive technologies are a type of passive Brain-computer interface (pBCI) that aim to incorporate implicit user-state information into human-machine interactions by monitoring neurophysiological signals. Evaluating machine learning and signal processing approaches represents a core aspect of research into neuroadaptive technologies. These evaluations are often conducted under controlled laboratory settings and offline, where exhaustive analyses are possible. However, the manner in which classifiers are evaluated offline has been shown to impact reported accuracy levels, possibly biasing conclusions. In the current study, we investigated one of these sources of bias, the choice of cross-validation scheme, which is often not reported in sufficient detail. Across three independent electroencephalography (EEG) n-back datasets and 74 participants, we show how metrics and conclusions based on the same data can diverge with different cross-validation choices. A comparison of cross-validation schemes in which train and test subset boundaries either respect the block-structure of the data collection or not, illustrated how the relative performance of classifiers varies significantly with the evaluation method used. By computing bootstrapped 95% confidence intervals of differences across datasets, we showed that classification accuracies of Riemannian minimum distance (RMDM) classifiers may differ by up to 12.7% while those of a Filter Bank Common Spatial Pattern (FBCSP) based linear discriminant analysis (LDA) may differ by up to 30.4%. These differences across cross-validation implementations may impact the conclusions presented in research papers, which can complicate efforts to foster reproducibility. Our results exemplify why detailed reporting on data splitting procedures should become common practice.</p>","PeriodicalId":517413,"journal":{"name":"Frontiers in neuroergonomics","volume":"6 ","pages":"1582724"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12259573/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in neuroergonomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fnrgo.2025.1582724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"ERGONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

Neuroadaptive technologies are a type of passive Brain-computer interface (pBCI) that aim to incorporate implicit user-state information into human-machine interactions by monitoring neurophysiological signals. Evaluating machine learning and signal processing approaches represents a core aspect of research into neuroadaptive technologies. These evaluations are often conducted under controlled laboratory settings and offline, where exhaustive analyses are possible. However, the manner in which classifiers are evaluated offline has been shown to impact reported accuracy levels, possibly biasing conclusions. In the current study, we investigated one of these sources of bias, the choice of cross-validation scheme, which is often not reported in sufficient detail. Across three independent electroencephalography (EEG) n-back datasets and 74 participants, we show how metrics and conclusions based on the same data can diverge with different cross-validation choices. A comparison of cross-validation schemes in which train and test subset boundaries either respect the block-structure of the data collection or not, illustrated how the relative performance of classifiers varies significantly with the evaluation method used. By computing bootstrapped 95% confidence intervals of differences across datasets, we showed that classification accuracies of Riemannian minimum distance (RMDM) classifiers may differ by up to 12.7% while those of a Filter Bank Common Spatial Pattern (FBCSP) based linear discriminant analysis (LDA) may differ by up to 30.4%. These differences across cross-validation implementations may impact the conclusions presented in research papers, which can complicate efforts to foster reproducibility. Our results exemplify why detailed reporting on data splitting procedures should become common practice.

交叉验证选择对pBCI分类度量的影响:透明报告的经验教训。
神经自适应技术是一种被动脑机接口(pBCI),旨在通过监测神经生理信号将隐含的用户状态信息纳入人机交互。评估机器学习和信号处理方法是神经适应技术研究的一个核心方面。这些评估通常是在受控的实验室环境和离线环境下进行的,在这种情况下可以进行详尽的分析。然而,离线评估分类器的方式已被证明会影响报告的准确性水平,可能会使结论产生偏差。在目前的研究中,我们调查了其中一个偏倚来源,即交叉验证方案的选择,这通常没有得到足够详细的报道。通过三个独立的脑电图(EEG) n-back数据集和74名参与者,我们展示了基于相同数据的指标和结论如何因不同的交叉验证选择而产生分歧。交叉验证方案的比较,其中训练和测试子集边界尊重或不尊重数据集的块结构,说明了分类器的相对性能如何随着所使用的评估方法而显着变化。通过计算数据集之间差异的自举95%置信区间,我们发现黎曼最小距离(RMDM)分类器的分类精度可能相差12.7%,而基于滤波器组公共空间模式(FBCSP)的线性判别分析(LDA)的分类精度可能相差30.4%。这些跨交叉验证实现的差异可能会影响研究论文中提出的结论,这可能会使促进可重复性的努力复杂化。我们的结果举例说明了为什么详细报告数据拆分过程应该成为一种常见做法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信