Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance

Jaehoon Joo, Taejin Jeong, Seongjae Hwang
{"title":"Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance","authors":"Jaehoon Joo, Taejin Jeong, Seongjae Hwang","doi":"arxiv-2409.12099","DOIUrl":null,"url":null,"abstract":"Understanding how humans process visual information is one of the crucial\nsteps for unraveling the underlying mechanism of brain activity. Recently, this\ncuriosity has motivated the fMRI-to-image reconstruction task; given the fMRI\ndata from visual stimuli, it aims to reconstruct the corresponding visual\nstimuli. Surprisingly, leveraging powerful generative models such as the Latent\nDiffusion Model (LDM) has shown promising results in reconstructing complex\nvisual stimuli such as high-resolution natural images from vision datasets.\nDespite the impressive structural fidelity of these reconstructions, they often\nlack details of small objects, ambiguous shapes, and semantic nuances.\nConsequently, the incorporation of additional semantic knowledge, beyond mere\nvisuals, becomes imperative. In light of this, we exploit how modern LDMs\neffectively incorporate multi-modal guidance (text guidance, visual guidance,\nand image layout) for structurally and semantically plausible image\ngenerations. Specifically, inspired by the two-streams hypothesis suggesting\nthat perceptual and semantic information are processed in different brain\nregions, our framework, Brain-Streams, maps fMRI signals from these brain\nregions to appropriate embeddings. That is, by extracting textual guidance from\nsemantic information regions and visual guidance from perceptual information\nregions, Brain-Streams provides accurate multi-modal guidance to LDMs. We\nvalidate the reconstruction ability of Brain-Streams both quantitatively and\nqualitatively on a real fMRI dataset comprising natural image stimuli and fMRI\ndata.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Understanding how humans process visual information is one of the crucial steps for unraveling the underlying mechanism of brain activity. Recently, this curiosity has motivated the fMRI-to-image reconstruction task; given the fMRI data from visual stimuli, it aims to reconstruct the corresponding visual stimuli. Surprisingly, leveraging powerful generative models such as the Latent Diffusion Model (LDM) has shown promising results in reconstructing complex visual stimuli such as high-resolution natural images from vision datasets. Despite the impressive structural fidelity of these reconstructions, they often lack details of small objects, ambiguous shapes, and semantic nuances. Consequently, the incorporation of additional semantic knowledge, beyond mere visuals, becomes imperative. In light of this, we exploit how modern LDMs effectively incorporate multi-modal guidance (text guidance, visual guidance, and image layout) for structurally and semantically plausible image generations. Specifically, inspired by the two-streams hypothesis suggesting that perceptual and semantic information are processed in different brain regions, our framework, Brain-Streams, maps fMRI signals from these brain regions to appropriate embeddings. That is, by extracting textual guidance from semantic information regions and visual guidance from perceptual information regions, Brain-Streams provides accurate multi-modal guidance to LDMs. We validate the reconstruction ability of Brain-Streams both quantitatively and qualitatively on a real fMRI dataset comprising natural image stimuli and fMRI data.
脑流:多模态引导下的 fMRI 图像重构
了解人类如何处理视觉信息是揭示大脑活动内在机制的关键步骤之一。最近,这种好奇心激发了从 fMRI 到图像的重建任务;给定来自视觉刺激的 fMRI 数据,其目的是重建相应的视觉刺激。令人惊讶的是,利用强大的生成模型,如潜在扩散模型(LatentDiffusion Model,LDM),在从视觉数据集重建复杂视觉刺激(如高分辨率自然图像)方面取得了令人鼓舞的成果。尽管这些重建的结构保真度令人印象深刻,但它们往往缺乏小物体、模糊形状和语义细微差别的细节。有鉴于此,我们探讨了现代 LDM 如何有效地结合多模式引导(文本引导、视觉引导和图像布局),以生成结构和语义上合理的图像。具体来说,双流假说认为感知信息和语义信息在不同的脑区进行处理,受此启发,我们的框架 "脑流"(Brain-Streams)将这些脑区的 fMRI 信号映射到适当的嵌入中。也就是说,通过从语义信息区域提取文本引导,从感知信息区域提取视觉引导,Brain-Streams 可为 LDM 提供准确的多模态引导。我们在由自然图像刺激和fMRI数据组成的真实fMRI数据集上对Brain-Streams的重构能力进行了定量和定性验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信