Reconstructing continuous language from brain signals measured by fMRI based brain-computer interface

Brain-X Pub Date : 2024-10-08 DOI:10.1002/brx2.70001

Shurui Li, Yuanning Li, Ru-Yuan Zhang

{"title":"Reconstructing continuous language from brain signals measured by fMRI based brain-computer interface","authors":"Shurui Li, Yuanning Li, Ru-Yuan Zhang","doi":"10.1002/brx2.70001","DOIUrl":null,"url":null,"abstract":"Brain-computer interfaces (BCIs) are designed to bridge the gap between human neural activity and external devices. Previous studies have shown that speech and text can be decoded from signals recorded from intracranial electrodes.1 Such applications can be used to develop neuroprostheses to restore speech function in patients with brain and psychiatric disorders.2 These methods largely rely on invasive intracranial neural recordings that provide signals with high spatiotemporal resolution and high signal-to-noise ratio. Despite the advantage of being non-invasive, low temporal resolution means functional magnetic resonance imaging (fMRI) has rarely been used in this context to decode continuous speech, with its application primarily limited to coarse classification tasks.3Despite this, fMRI-based neural encoding models have seen great progress in the last decade. For example, voxel-wise neural responses to continuous natural speech can be predicted using feature embeddings extracted from language models.4 To reconstruct continuous speech from fMRI, three obstacles must be overcome. First, the brain's semantic representation regions are not clearly defined—previous research suggests a distributed network across various brain areas. Second, due to its temporal sluggishness, a single fMRI time point captures information from multiple preceding words within a 6–10-s window. Third, constraining the semantic space in language construction is challenging, as existing fMRI data capture only a fraction of the real semantic richness.In a recently published study,5 Tang and colleagues propose a Bayesian method to decode continuous language from brain responses measured by fMRI. Unlike previous attempts to decode semantic vectors (S) directly from brain responses (R), this study used brain responses as a control condition for language generation models. The goal was to invert the encoding model to identify the most appropriate stimulus. According to Bayesian theory, the decoder estimates the posterior distribution P(S|R) and finds the stimuli S that maximizes the posterior distribution given the neural response R. Instead of directly building decoders that estimate P(S|R), which is usually intractable due to the aforementioned difficulties, the authors took advantage of the Bayesian decoding framework that P(S|R) ∝ P(S)P(R|S) and focused instead on the encoding model P(R|S).This work successfully overcame the three main barriers to fMRI-based language decoding. First, to localize the brain voxels containing semantic information, encoding performance was used as a metric to select voxels for decoding. Second, to deal with the temporal sluggishness of blood oxygen level-dependent (BOLD) signals, the semantic information for 10 s preceding each repetition time was used to build the encoding model. Third, to ensure that meaningful and readable sentences could be reconstructed, the language model GPT-1 was used to parameterize the prior distribution P(S) over the entire semantic space. GPT-1 uses an autoregressive model to predict words based on prior context, enabling natural language generation. Additionally, a beam search algorithm was used to maintain a relatively large and stable candidate pool.We note several differences between non-invasive fMRI-based and invasive electrophysiology-based language decoding. The success of language decoding in this study is mainly due to the distributed nature of semantic representations in the brain, and the fact that semantic representations during speech perception can be reliably captured by BOLD signals. However, semantic space is highly multi-dimensional, continuous, and infinite. Invasive speech BCIs rely on electrophysiological signals with high temporal resolution from the sensorimotor cortex; finite, discrete sets of decoding targets, such as phonemes or letters, result in relatively low word error rates. Nevertheless, the semantic reconstruction approach proposed in this study is promising for decoding higher-level amodal concepts, for example, the decoding of text from silent videos, which cannot be easily achieved by invasive speech-motor BCIs.Despite the many advantages mentioned above, this work still has some limitations. First, in the Bayesian decoding framework, the effectiveness of the decoder depends heavily on the performance of the encoding model. GPT-1 embeddings may represent only a subset of the semantic information in the brain. For example, in this work, only well-encoded voxels were used for decoding. The remaining voxels are probably also involved in semantic representation, but cannot be encoded by GPT-1 embeddings. Second, this work assumed that the total brain response is the sum of responses to semantics in previous time points. This assumption may not be consistent with the actual activation process in the brain.Despite its limitations, this study sheds new light on non-invasive BCI techniques. We see several promising directions for BCIs in the future. First, safer, portable, and durable invasive BCIs could help thousands of patients with neurological disorders to express their thoughts. Second, cheaper, smaller non-invasive BCIs may have clinical and entertainment applications, such as in the metaverse. Finally, it is also crucial to improve the temporal resolution of non-invasive BCIs. For example, combination with electroencephalogram or magnetoencephalography data could compensate for the low temporal resolution of fMRI. With higher temporal resolution, the decoder could use both semantic and sensorimotor information to improve reconstruction accuracy.Shurui Li: Conceptualization; formal analysis; visualization; writing—original draft. Yuanning Li: Conceptualization; funding acquisition; investigation; resources; supervision; validation; visualization; writing—review and editing. Ru-Yuan Zhang: Conceptualization; formal analysis; funding acquisition; project administration; resources; supervision; validation; visualization; writing—original draft; writing—review and editing.The authors declare no competing interests.This is a commentary paper with no empirical experiment.","PeriodicalId":94303,"journal":{"name":"Brain-X","volume":"2 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/brx2.70001","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brain-X","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/brx2.70001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Brain-computer interfaces (BCIs) are designed to bridge the gap between human neural activity and external devices. Previous studies have shown that speech and text can be decoded from signals recorded from intracranial electrodes.¹ Such applications can be used to develop neuroprostheses to restore speech function in patients with brain and psychiatric disorders.² These methods largely rely on invasive intracranial neural recordings that provide signals with high spatiotemporal resolution and high signal-to-noise ratio. Despite the advantage of being non-invasive, low temporal resolution means functional magnetic resonance imaging (fMRI) has rarely been used in this context to decode continuous speech, with its application primarily limited to coarse classification tasks.³

Despite this, fMRI-based neural encoding models have seen great progress in the last decade. For example, voxel-wise neural responses to continuous natural speech can be predicted using feature embeddings extracted from language models.⁴ To reconstruct continuous speech from fMRI, three obstacles must be overcome. First, the brain's semantic representation regions are not clearly defined—previous research suggests a distributed network across various brain areas. Second, due to its temporal sluggishness, a single fMRI time point captures information from multiple preceding words within a 6–10-s window. Third, constraining the semantic space in language construction is challenging, as existing fMRI data capture only a fraction of the real semantic richness.

In a recently published study,⁵ Tang and colleagues propose a Bayesian method to decode continuous language from brain responses measured by fMRI. Unlike previous attempts to decode semantic vectors (S) directly from brain responses (R), this study used brain responses as a control condition for language generation models. The goal was to invert the encoding model to identify the most appropriate stimulus. According to Bayesian theory, the decoder estimates the posterior distribution P(S|R) and finds the stimuli S that maximizes the posterior distribution given the neural response R. Instead of directly building decoders that estimate P(S|R), which is usually intractable due to the aforementioned difficulties, the authors took advantage of the Bayesian decoding framework that P(S|R) ∝ P(S)P(R|S) and focused instead on the encoding model P(R|S).

This work successfully overcame the three main barriers to fMRI-based language decoding. First, to localize the brain voxels containing semantic information, encoding performance was used as a metric to select voxels for decoding. Second, to deal with the temporal sluggishness of blood oxygen level-dependent (BOLD) signals, the semantic information for 10 s preceding each repetition time was used to build the encoding model. Third, to ensure that meaningful and readable sentences could be reconstructed, the language model GPT-1 was used to parameterize the prior distribution P(S) over the entire semantic space. GPT-1 uses an autoregressive model to predict words based on prior context, enabling natural language generation. Additionally, a beam search algorithm was used to maintain a relatively large and stable candidate pool.

We note several differences between non-invasive fMRI-based and invasive electrophysiology-based language decoding. The success of language decoding in this study is mainly due to the distributed nature of semantic representations in the brain, and the fact that semantic representations during speech perception can be reliably captured by BOLD signals. However, semantic space is highly multi-dimensional, continuous, and infinite. Invasive speech BCIs rely on electrophysiological signals with high temporal resolution from the sensorimotor cortex; finite, discrete sets of decoding targets, such as phonemes or letters, result in relatively low word error rates. Nevertheless, the semantic reconstruction approach proposed in this study is promising for decoding higher-level amodal concepts, for example, the decoding of text from silent videos, which cannot be easily achieved by invasive speech-motor BCIs.

Despite the many advantages mentioned above, this work still has some limitations. First, in the Bayesian decoding framework, the effectiveness of the decoder depends heavily on the performance of the encoding model. GPT-1 embeddings may represent only a subset of the semantic information in the brain. For example, in this work, only well-encoded voxels were used for decoding. The remaining voxels are probably also involved in semantic representation, but cannot be encoded by GPT-1 embeddings. Second, this work assumed that the total brain response is the sum of responses to semantics in previous time points. This assumption may not be consistent with the actual activation process in the brain.

Despite its limitations, this study sheds new light on non-invasive BCI techniques. We see several promising directions for BCIs in the future. First, safer, portable, and durable invasive BCIs could help thousands of patients with neurological disorders to express their thoughts. Second, cheaper, smaller non-invasive BCIs may have clinical and entertainment applications, such as in the metaverse. Finally, it is also crucial to improve the temporal resolution of non-invasive BCIs. For example, combination with electroencephalogram or magnetoencephalography data could compensate for the low temporal resolution of fMRI. With higher temporal resolution, the decoder could use both semantic and sensorimotor information to improve reconstruction accuracy.

Shurui Li: Conceptualization; formal analysis; visualization; writing—original draft. Yuanning Li: Conceptualization; funding acquisition; investigation; resources; supervision; validation; visualization; writing—review and editing. Ru-Yuan Zhang: Conceptualization; formal analysis; funding acquisition; project administration; resources; supervision; validation; visualization; writing—original draft; writing—review and editing.

The authors declare no competing interests.

This is a commentary paper with no empirical experiment.

查看原文本刊更多论文

从基于 fMRI 的脑机接口测量到的大脑信号中重建连续语言

我们看到了未来 BCIs 的几个大有可为的发展方向。首先，更安全、便携和耐用的侵入式BCIs可以帮助成千上万的神经系统疾病患者表达自己的想法。其次，成本更低、体积更小的非侵入式 BCI 可能会应用于临床和娱乐，例如在元宇宙中。最后，提高无创生物识别技术的时间分辨率也至关重要。例如，结合脑电图或脑磁图数据可以弥补 fMRI 的低时间分辨率。有了更高的时间分辨率，解码器就可以利用语义和感觉运动信息来提高重建的准确性。李远宁：构思；经费获取；调查；资源；监督；验证；可视化；写作-审阅和编辑。张如元构思；形式分析；资金获取；项目管理；资源；监督；验证；可视化；撰写-原稿；撰写-审阅和编辑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Brain-X

自引率

0.00%

发文量