Exploring the Potential of Electroencephalography Signal-Based Image Generation Using Diffusion Models: Integrative Framework Combining Mixed Methods and Multimodal Analysis.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2025-06-25 DOI:10.2196/72027

Chi-Sheng Chen, Shao-Hsuan Chang, Che-Wei Liu, Tung-Ming Pan

{"title":"Exploring the Potential of Electroencephalography Signal-Based Image Generation Using Diffusion Models: Integrative Framework Combining Mixed Methods and Multimodal Analysis.","authors":"Chi-Sheng Chen, Shao-Hsuan Chang, Che-Wei Liu, Tung-Ming Pan","doi":"10.2196/72027","DOIUrl":null,"url":null,"abstract":"Background: Electroencephalography (EEG) has been widely used to measure brain activity, but its potential to generate accurate images from neural signals remains a challenge. Most EEG-decoding research has focused on tasks such as motor imagery, emotion recognition, and brain wave classification, which involve EEG signal analysis and classification. Some studies have explored the correlation between EEG and images, primarily focusing on EEG-image pair classification or transformation. However, EEG-based image generation remains underexplored.Objective: The primary goal of this study was to extend EEG-based classification to image generation, addressing the limitations of previous methods and unlocking the full potential of EEG for image synthesis. To achieve more meaningful EEG-to-image generation, we developed a novel framework, Neural-Cognitive Multimodal EEG-Informed Image (NECOMIMI), which was specifically designed to generate images directly from EEG signals.Methods: We developed a 2-stage NECOMIMI method, which integrated the novel Neural Encoding Representation Vectorizer (NERV) EEG encoder that we designed with a diffusion-based generative model. The Category-Based Assessment Table (CAT) score was introduced to evaluate the semantic quality of EEG-generated images. In addition, the ThingsEEG dataset was used to validate and benchmark the CAT score, providing a standardized measure for assessing EEG-to-image generation performance.Results: The NERV EEG encoder achieved state-of-the-art performance in several zero-shot classification tasks, with an average accuracy of 94.8% (SD 1.7%) in the 2-way task and 86.8% (SD 3.4%) in the 4-way task, outperforming models such as Natural Image Contrast EEG, Multimodal Similarity-Keeping Contrastive Learning, and Adaptive Thinking Mapper ShallowNet. This highlighted its superiority as a feature extraction tool for EEG signals. In a 1-stage image generation framework, EEG embeddings often resulted in abstract or generalized images such as landscapes instead of specific objects. Our proposed 2-stage NECOMIMI architecture effectively extracted semantic information from noisy EEG signals, showing its ability to capture and represent underlying concepts derived from brain wave activity. We further conducted a perturbation study to test whether the model overly depended on visual cortex EEG signals for scene-based image generation. The perturbation of visual cortex EEG channels led to a notable increase in Fréchet inception distance scores, suggesting that our model relied heavily on posterior brain signals to generate semantically coherent images.Conclusions: NECOMIMI demonstrated the potential of EEG-to-image generation, revealing the challenges of translating noisy EEG data into accurate visual representations. The novel NERV EEG encoder for multimodal contrastive learning reached state-of-the-art performance both on n-way zero-shot and EEG-informed image generation. The introduction of the CAT score provided a new evaluation metric, paving the way for future research to refine generative models. In addition, this study highlighted the significant clinical potential of EEG-to-image generation, particularly in enhancing brain-machine interface systems and improving quality of life for individuals with motor impairments.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e72027"},"PeriodicalIF":3.1000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12242056/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/72027","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Electroencephalography (EEG) has been widely used to measure brain activity, but its potential to generate accurate images from neural signals remains a challenge. Most EEG-decoding research has focused on tasks such as motor imagery, emotion recognition, and brain wave classification, which involve EEG signal analysis and classification. Some studies have explored the correlation between EEG and images, primarily focusing on EEG-image pair classification or transformation. However, EEG-based image generation remains underexplored.

Objective: The primary goal of this study was to extend EEG-based classification to image generation, addressing the limitations of previous methods and unlocking the full potential of EEG for image synthesis. To achieve more meaningful EEG-to-image generation, we developed a novel framework, Neural-Cognitive Multimodal EEG-Informed Image (NECOMIMI), which was specifically designed to generate images directly from EEG signals.

Methods: We developed a 2-stage NECOMIMI method, which integrated the novel Neural Encoding Representation Vectorizer (NERV) EEG encoder that we designed with a diffusion-based generative model. The Category-Based Assessment Table (CAT) score was introduced to evaluate the semantic quality of EEG-generated images. In addition, the ThingsEEG dataset was used to validate and benchmark the CAT score, providing a standardized measure for assessing EEG-to-image generation performance.

Results: The NERV EEG encoder achieved state-of-the-art performance in several zero-shot classification tasks, with an average accuracy of 94.8% (SD 1.7%) in the 2-way task and 86.8% (SD 3.4%) in the 4-way task, outperforming models such as Natural Image Contrast EEG, Multimodal Similarity-Keeping Contrastive Learning, and Adaptive Thinking Mapper ShallowNet. This highlighted its superiority as a feature extraction tool for EEG signals. In a 1-stage image generation framework, EEG embeddings often resulted in abstract or generalized images such as landscapes instead of specific objects. Our proposed 2-stage NECOMIMI architecture effectively extracted semantic information from noisy EEG signals, showing its ability to capture and represent underlying concepts derived from brain wave activity. We further conducted a perturbation study to test whether the model overly depended on visual cortex EEG signals for scene-based image generation. The perturbation of visual cortex EEG channels led to a notable increase in Fréchet inception distance scores, suggesting that our model relied heavily on posterior brain signals to generate semantically coherent images.

Conclusions: NECOMIMI demonstrated the potential of EEG-to-image generation, revealing the challenges of translating noisy EEG data into accurate visual representations. The novel NERV EEG encoder for multimodal contrastive learning reached state-of-the-art performance both on n-way zero-shot and EEG-informed image generation. The introduction of the CAT score provided a new evaluation metric, paving the way for future research to refine generative models. In addition, this study highlighted the significant clinical potential of EEG-to-image generation, particularly in enhancing brain-machine interface systems and improving quality of life for individuals with motor impairments.

查看原文本刊更多论文

利用扩散模型探索基于脑电图信号的图像生成的潜力：结合混合方法和多模态分析的综合框架。

背景：脑电图（EEG）已被广泛用于测量大脑活动，但其从神经信号生成准确图像的潜力仍然是一个挑战。大多数脑电图解码研究集中在运动图像、情绪识别和脑电波分类等任务上，这些任务涉及脑电图信号的分析和分类。一些研究探讨了脑电与图像之间的相关性，主要集中在脑电图像对的分类或转换上。然而，基于脑电图的图像生成仍未得到充分的探索。目的：本研究的主要目的是将基于脑电图的分类扩展到图像生成，解决以往方法的局限性，释放脑电图在图像合成中的全部潜力。为了实现更有意义的脑电图图像生成，我们开发了一个新的框架，神经认知多模态脑电图信息图像（NECOMIMI），专门用于直接从脑电图信号生成图像。方法：提出了一种两阶段NECOMIMI方法，该方法将我们设计的新型神经编码表示矢量器（NERV） EEG编码器与基于扩散的生成模型相结合。引入基于类别的评估表（CAT）评分来评估脑电信号生成图像的语义质量。此外，ThingsEEG数据集用于验证和基准CAT评分，为评估EEG-to-image生成性能提供了标准化度量。结果：NERV脑电图编码器在多个零采样分类任务中取得了较好的表现，在双向任务中平均准确率为94.8% (SD 1.7%)，在四向任务中平均准确率为86.8% (SD 3.4%)，优于自然图像对比脑电图、多模态保持相似性对比学习和自适应思维映射器ShallowNet等模型。这突出了其作为脑电信号特征提取工具的优越性。在一阶段图像生成框架中，脑电图嵌入通常产生抽象或广义的图像，如景观，而不是特定的物体。我们提出的两阶段NECOMIMI架构有效地从有噪声的脑电波信号中提取语义信息，显示了其捕获和表示源自脑电波活动的潜在概念的能力。我们进一步进行了扰动研究，以测试模型是否过度依赖视觉皮层脑电图信号来生成基于场景的图像。视觉皮层脑电图通道的扰动导致fr起始距离分数显著增加，这表明我们的模型严重依赖后脑信号来生成语义连贯的图像。结论：NECOMIMI展示了脑电图到图像生成的潜力，揭示了将嘈杂的脑电图数据转化为准确的视觉表征的挑战。用于多模态对比学习的新型NERV脑电图编码器在n向零射击和脑电图信息图像生成方面都达到了最先进的性能。CAT评分的引入提供了一种新的评价指标，为未来研究完善生成模型铺平了道路。此外，这项研究强调了脑电图成像的重要临床潜力，特别是在增强脑机接口系统和改善运动障碍患者的生活质量方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.