fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Reconstruction

Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jianfeng Feng, Yanwei Fu
{"title":"fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Reconstruction","authors":"Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jianfeng Feng, Yanwei Fu","doi":"arxiv-2409.11315","DOIUrl":null,"url":null,"abstract":"Reconstructing 3D visuals from functional Magnetic Resonance Imaging (fMRI)\ndata, introduced as Recon3DMind in our conference work, is of significant\ninterest to both cognitive neuroscience and computer vision. To advance this\ntask, we present the fMRI-3D dataset, which includes data from 15 participants\nand showcases a total of 4768 3D objects. The dataset comprises two components:\nfMRI-Shape, previously introduced and accessible at\nhttps://huggingface.co/datasets/Fudan-fMRI/fMRI-Shape, and fMRI-Objaverse,\nproposed in this paper and available at\nhttps://huggingface.co/datasets/Fudan-fMRI/fMRI-Objaverse. fMRI-Objaverse\nincludes data from 5 subjects, 4 of whom are also part of the Core set in\nfMRI-Shape, with each subject viewing 3142 3D objects across 117 categories,\nall accompanied by text captions. This significantly enhances the diversity and\npotential applications of the dataset. Additionally, we propose MinD-3D, a\nnovel framework designed to decode 3D visual information from fMRI signals. The\nframework first extracts and aggregates features from fMRI data using a\nneuro-fusion encoder, then employs a feature-bridge diffusion model to generate\nvisual features, and finally reconstructs the 3D object using a generative\ntransformer decoder. We establish new benchmarks by designing metrics at both\nsemantic and structural levels to evaluate model performance. Furthermore, we\nassess our model's effectiveness in an Out-of-Distribution setting and analyze\nthe attribution of the extracted features and the visual ROIs in fMRI signals.\nOur experiments demonstrate that MinD-3D not only reconstructs 3D objects with\nhigh semantic and spatial accuracy but also deepens our understanding of how\nhuman brain processes 3D visual information. Project page at:\nhttps://jianxgao.github.io/MinD-3D.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Reconstructing 3D visuals from functional Magnetic Resonance Imaging (fMRI) data, introduced as Recon3DMind in our conference work, is of significant interest to both cognitive neuroscience and computer vision. To advance this task, we present the fMRI-3D dataset, which includes data from 15 participants and showcases a total of 4768 3D objects. The dataset comprises two components: fMRI-Shape, previously introduced and accessible at https://huggingface.co/datasets/Fudan-fMRI/fMRI-Shape, and fMRI-Objaverse, proposed in this paper and available at https://huggingface.co/datasets/Fudan-fMRI/fMRI-Objaverse. fMRI-Objaverse includes data from 5 subjects, 4 of whom are also part of the Core set in fMRI-Shape, with each subject viewing 3142 3D objects across 117 categories, all accompanied by text captions. This significantly enhances the diversity and potential applications of the dataset. Additionally, we propose MinD-3D, a novel framework designed to decode 3D visual information from fMRI signals. The framework first extracts and aggregates features from fMRI data using a neuro-fusion encoder, then employs a feature-bridge diffusion model to generate visual features, and finally reconstructs the 3D object using a generative transformer decoder. We establish new benchmarks by designing metrics at both semantic and structural levels to evaluate model performance. Furthermore, we assess our model's effectiveness in an Out-of-Distribution setting and analyze the attribution of the extracted features and the visual ROIs in fMRI signals. Our experiments demonstrate that MinD-3D not only reconstructs 3D objects with high semantic and spatial accuracy but also deepens our understanding of how human brain processes 3D visual information. Project page at: https://jianxgao.github.io/MinD-3D.
fMRI-3D:用于增强基于 fMRI 的三维重建的综合数据集
从功能性磁共振成像(fMRI)数据中重建三维视觉效果,在我们的会议工作中被称为Recon3DMind,对认知神经科学和计算机视觉都具有重要意义。为了推进这项任务,我们推出了 fMRI-3D 数据集,其中包括 15 名参与者的数据,并展示了总共 4768 个三维对象。该数据集由两部分组成:fMRI-Shape(之前已介绍过,可访问https://huggingface.co/datasets/Fudan-fMRI/fMRI-Shape)和fMRI-Objaverse(本文提出,可访问https://huggingface.co/datasets/Fudan-fMRI/fMRI-Objaverse)。fMRI-Objaverse包括来自5位受试者的数据,其中4位也是核心集fMRI-Shape的一部分,每位受试者观看了117个类别的3142个三维物体,所有物体都配有文字说明。这大大增强了数据集的多样性和潜在应用。此外,我们还提出了 MinD-3D,一个旨在从 fMRI 信号中解码 3D 视觉信息的高级框架。该框架首先使用神经融合编码器从 fMRI 数据中提取和聚合特征,然后使用特征桥扩散模型生成视觉特征,最后使用生成式变换器解码器重建三维物体。我们设计了语义和结构两个层面的指标来评估模型性能,从而建立了新的基准。我们的实验证明,MinD-3D 不仅能以较高的语义和空间准确性重建 3D 物体,还能加深我们对人脑如何处理 3D 视觉信息的理解。项目页面:https://jianxgao.github.io/MinD-3D。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信