基于侵入性脑机接口的语音解码波形重构改进评估。

Imaging neuroscience (Cambridge, Mass.) Pub Date : 2025-09-10 eCollection Date: 2025-01-01 DOI:10.1162/IMAG.a.146

Xiaolong Wu, Kejia Hu, Zhichun Fu, Dingguo Zhang

{"title":"基于侵入性脑机接口的语音解码波形重构改进评估。","authors":"Xiaolong Wu, Kejia Hu, Zhichun Fu, Dingguo Zhang","doi":"10.1162/IMAG.a.146","DOIUrl":null,"url":null,"abstract":"Brain-computer interfaces (BCIs) that reconstruct speech waveforms from neural signals are a promising communication technology. However, the field lacks a standardized evaluation metric, making it difficult to compare results across studies. Existing objective metrics, such as correlation coefficient (CC) and mel cepstral distortion (MCD), are often used inconsistently and have intrinsic limitations. This study addresses the critical need for a robust and validated method for evaluating reconstructed waveform quality. Literature about waveform reconstruction from intracranial signals is reviewed, and issues with evaluation methods are presented. We collated reconstructed audio from 10 published speech BCI studies and collected Mean Opinion Scores (MOS) from human raters to serve as a perceptual ground truth. We then systematically evaluated how well combinations of existing objective metrics (STOI and MCD) could predict these MOS scores. To ensure robustness and generalizability, we employed a rigorous leave-one-dataset-out cross-validation scheme and compared multiple models, including linear and non-linear regressors. This work, for the first time, identifies a lack of a standard evaluation method, which prohibits cross-study comparison. Using 10 public datasets, our analysis reveals that a non-linear model, specifically a Random Forest regressor, provides the most accurate and reliable prediction of subjective MOS ratings (R² = 0.892). We propose this cross-validated Random Forest model, which maps STOI and MCD to a predicted MOS score, as a standardized objective evaluation metric for the speech BCI field. Its demonstrated accuracy and robust validation outperform the available methods. Moreover, it can provide the community with a reliable tool to benchmark performance, facilitate meaningful cross-study comparisons for the first time, and accelerate progress in speech neuroprosthetics.","PeriodicalId":73341,"journal":{"name":"Imaging neuroscience (Cambridge, Mass.)","volume":"3 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12434379/pdf/","citationCount":"0","resultStr":"{\"title\":\"Improved evaluation of waveform reconstruction in speech decoding based on invasive brain-computer interfaces.\",\"authors\":\"Xiaolong Wu, Kejia Hu, Zhichun Fu, Dingguo Zhang\",\"doi\":\"10.1162/IMAG.a.146\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Brain-computer interfaces (BCIs) that reconstruct speech waveforms from neural signals are a promising communication technology. However, the field lacks a standardized evaluation metric, making it difficult to compare results across studies. Existing objective metrics, such as correlation coefficient (CC) and mel cepstral distortion (MCD), are often used inconsistently and have intrinsic limitations. This study addresses the critical need for a robust and validated method for evaluating reconstructed waveform quality. Literature about waveform reconstruction from intracranial signals is reviewed, and issues with evaluation methods are presented. We collated reconstructed audio from 10 published speech BCI studies and collected Mean Opinion Scores (MOS) from human raters to serve as a perceptual ground truth. We then systematically evaluated how well combinations of existing objective metrics (STOI and MCD) could predict these MOS scores. To ensure robustness and generalizability, we employed a rigorous leave-one-dataset-out cross-validation scheme and compared multiple models, including linear and non-linear regressors. This work, for the first time, identifies a lack of a standard evaluation method, which prohibits cross-study comparison. Using 10 public datasets, our analysis reveals that a non-linear model, specifically a Random Forest regressor, provides the most accurate and reliable prediction of subjective MOS ratings (R² = 0.892). We propose this cross-validated Random Forest model, which maps STOI and MCD to a predicted MOS score, as a standardized objective evaluation metric for the speech BCI field. Its demonstrated accuracy and robust validation outperform the available methods. Moreover, it can provide the community with a reliable tool to benchmark performance, facilitate meaningful cross-study comparisons for the first time, and accelerate progress in speech neuroprosthetics.\",\"PeriodicalId\":73341,\"journal\":{\"name\":\"Imaging neuroscience (Cambridge, Mass.)\",\"volume\":\"3 \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12434379/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Imaging neuroscience (Cambridge, Mass.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1162/IMAG.a.146\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Imaging neuroscience (Cambridge, Mass.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/IMAG.a.146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于神经信号重建语音波形的脑机接口（bci）是一种很有前途的通信技术。然而，该领域缺乏一个标准化的评估指标，这使得很难比较不同研究的结果。现有的客观指标，如相关系数（CC）和倒谱失真（MCD），往往使用不一致，并有内在的局限性。本研究解决了对评估重建波形质量的鲁棒和有效方法的迫切需求。回顾了有关颅内信号波形重建的文献，并提出了评估方法中的问题。我们整理了10个已发表的语音脑机接口研究的重建音频，并从人类评分者那里收集了平均意见分数（MOS），作为感知基础真理。然后，我们系统地评估了现有客观指标（STOI和MCD）的组合如何预测这些MOS分数。为了确保稳健性和通用性，我们采用了严格的留一组数据的交叉验证方案，并比较了多个模型，包括线性和非线性回归。这项工作，首次确定了缺乏标准的评估方法，禁止交叉研究比较。使用10个公开数据集，我们的分析表明，非线性模型，特别是随机森林回归器，提供了最准确和可靠的主观MOS评级预测（R²= 0.892）。我们提出了这种交叉验证的随机森林模型，该模型将STOI和MCD映射到预测的MOS分数，作为语音BCI领域的标准化客观评价指标。其证明的准确性和鲁棒性验证优于现有的方法。此外，它可以为社区提供一个可靠的工具来基准性能，第一次促进有意义的交叉研究比较，并加速语音神经假肢的进展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improved evaluation of waveform reconstruction in speech decoding based on invasive brain-computer interfaces.

Brain-computer interfaces (BCIs) that reconstruct speech waveforms from neural signals are a promising communication technology. However, the field lacks a standardized evaluation metric, making it difficult to compare results across studies. Existing objective metrics, such as correlation coefficient (CC) and mel cepstral distortion (MCD), are often used inconsistently and have intrinsic limitations. This study addresses the critical need for a robust and validated method for evaluating reconstructed waveform quality. Literature about waveform reconstruction from intracranial signals is reviewed, and issues with evaluation methods are presented. We collated reconstructed audio from 10 published speech BCI studies and collected Mean Opinion Scores (MOS) from human raters to serve as a perceptual ground truth. We then systematically evaluated how well combinations of existing objective metrics (STOI and MCD) could predict these MOS scores. To ensure robustness and generalizability, we employed a rigorous leave-one-dataset-out cross-validation scheme and compared multiple models, including linear and non-linear regressors. This work, for the first time, identifies a lack of a standard evaluation method, which prohibits cross-study comparison. Using 10 public datasets, our analysis reveals that a non-linear model, specifically a Random Forest regressor, provides the most accurate and reliable prediction of subjective MOS ratings (R² = 0.892). We propose this cross-validated Random Forest model, which maps STOI and MCD to a predicted MOS score, as a standardized objective evaluation metric for the speech BCI field. Its demonstrated accuracy and robust validation outperform the available methods. Moreover, it can provide the community with a reliable tool to benchmark performance, facilitate meaningful cross-study comparisons for the first time, and accelerate progress in speech neuroprosthetics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Imaging neuroscience (Cambridge, Mass.)

自引率

0.00%

发文量