Effective Phoneme Decoding With Hyperbolic Neural Networks for High-Performance Speech BCIs

IF 4.8 2区 医学 Q2 ENGINEERING, BIOMEDICAL
Xianhan Tan;Qi Lian;Junming Zhu;Jianmin Zhang;Yueming Wang;Yu Qi
{"title":"Effective Phoneme Decoding With Hyperbolic Neural Networks for High-Performance Speech BCIs","authors":"Xianhan Tan;Qi Lian;Junming Zhu;Jianmin Zhang;Yueming Wang;Yu Qi","doi":"10.1109/TNSRE.2024.3457313","DOIUrl":null,"url":null,"abstract":"Objective: Speech brain-computer interfaces (speech BCIs), which convert brain signals into spoken words or sentences, have demonstrated great potential for high-performance BCI communication. Phonemes are the basic pronunciation units. For monosyllabic languages such as Chinese Mandarin, where a word usually contains less than three phonemes, accurate decoding of phonemes plays a vital role. We found that in the neural representation space, phonemes with similar pronunciations are often inseparable, leading to confusion in phoneme classification. Methods: We mapped the neural signals of phoneme pronunciation into a hyperbolic space for a more distinct phoneme representation. Critically, we proposed a hyperbolic hierarchical clustering approach to specifically learn a phoneme-level structure to guide the representation. Results: We found such representation facilitated greater distance between similar phonemes, effectively reducing confusion. In the phoneme decoding task, our approach demonstrated an average accuracy of 75.21% for 21 phonemes and outperformed existing methods across different experimental days. Conclusion: Our approach showed high accuracy in phoneme classification. By learning the phoneme-level neural structure, the representations of neural signals were more discriminative and interpretable. Significance: Our approach can potentially facilitate high-performance speech BCIs for Chinese and other monosyllabic languages.","PeriodicalId":13419,"journal":{"name":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","volume":"32 ","pages":"3432-3441"},"PeriodicalIF":4.8000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10672534","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10672534/","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Speech brain-computer interfaces (speech BCIs), which convert brain signals into spoken words or sentences, have demonstrated great potential for high-performance BCI communication. Phonemes are the basic pronunciation units. For monosyllabic languages such as Chinese Mandarin, where a word usually contains less than three phonemes, accurate decoding of phonemes plays a vital role. We found that in the neural representation space, phonemes with similar pronunciations are often inseparable, leading to confusion in phoneme classification. Methods: We mapped the neural signals of phoneme pronunciation into a hyperbolic space for a more distinct phoneme representation. Critically, we proposed a hyperbolic hierarchical clustering approach to specifically learn a phoneme-level structure to guide the representation. Results: We found such representation facilitated greater distance between similar phonemes, effectively reducing confusion. In the phoneme decoding task, our approach demonstrated an average accuracy of 75.21% for 21 phonemes and outperformed existing methods across different experimental days. Conclusion: Our approach showed high accuracy in phoneme classification. By learning the phoneme-level neural structure, the representations of neural signals were more discriminative and interpretable. Significance: Our approach can potentially facilitate high-performance speech BCIs for Chinese and other monosyllabic languages.
利用双曲神经网络为高性能语音 BCI 进行有效的音素解码
目的:语音脑机接口(speech BCIs)可将大脑信号转换为口语单词或句子,在高性能脑机接口通信方面具有巨大潜力。音素是基本的发音单位。对于单音节语言(如汉语普通话)来说,一个单词通常包含不到三个音素,因此音素的准确解码起着至关重要的作用。我们发现,在神经表征空间中,发音相似的音素往往是不可分割的,从而导致音素分类的混乱。研究方法我们将音素发音的神经信号映射到双曲空间中,以获得更清晰的音素表征。重要的是,我们提出了一种双曲分层聚类方法,专门学习一种音素级结构来指导表征。结果:我们发现,这种表征有助于拉大相似音素之间的距离,有效减少混淆。在音素解码任务中,我们的方法对 21 个音素的平均准确率为 75.21%,在不同的实验日中表现优于现有方法。结论我们的方法在音素分类方面表现出很高的准确率。通过学习音素级神经结构,神经信号的表征更具辨别力和可解释性。意义重大:我们的方法有可能促进中文和其他单音节语言的高性能语音 BCI。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.60
自引率
8.20%
发文量
479
审稿时长
6-12 weeks
期刊介绍: Rehabilitative and neural aspects of biomedical engineering, including functional electrical stimulation, acoustic dynamics, human performance measurement and analysis, nerve stimulation, electromyography, motor control and stimulation; and hardware and software applications for rehabilitation engineering and assistive devices.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信