{"title":"fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Reconstruction","authors":"Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jianfeng Feng, Yanwei Fu","doi":"arxiv-2409.11315","DOIUrl":null,"url":null,"abstract":"Reconstructing 3D visuals from functional Magnetic Resonance Imaging (fMRI)\ndata, introduced as Recon3DMind in our conference work, is of significant\ninterest to both cognitive neuroscience and computer vision. To advance this\ntask, we present the fMRI-3D dataset, which includes data from 15 participants\nand showcases a total of 4768 3D objects. The dataset comprises two components:\nfMRI-Shape, previously introduced and accessible at\nhttps://huggingface.co/datasets/Fudan-fMRI/fMRI-Shape, and fMRI-Objaverse,\nproposed in this paper and available at\nhttps://huggingface.co/datasets/Fudan-fMRI/fMRI-Objaverse. fMRI-Objaverse\nincludes data from 5 subjects, 4 of whom are also part of the Core set in\nfMRI-Shape, with each subject viewing 3142 3D objects across 117 categories,\nall accompanied by text captions. This significantly enhances the diversity and\npotential applications of the dataset. Additionally, we propose MinD-3D, a\nnovel framework designed to decode 3D visual information from fMRI signals. The\nframework first extracts and aggregates features from fMRI data using a\nneuro-fusion encoder, then employs a feature-bridge diffusion model to generate\nvisual features, and finally reconstructs the 3D object using a generative\ntransformer decoder. We establish new benchmarks by designing metrics at both\nsemantic and structural levels to evaluate model performance. Furthermore, we\nassess our model's effectiveness in an Out-of-Distribution setting and analyze\nthe attribution of the extracted features and the visual ROIs in fMRI signals.\nOur experiments demonstrate that MinD-3D not only reconstructs 3D objects with\nhigh semantic and spatial accuracy but also deepens our understanding of how\nhuman brain processes 3D visual information. Project page at:\nhttps://jianxgao.github.io/MinD-3D.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Reconstructing 3D visuals from functional Magnetic Resonance Imaging (fMRI)
data, introduced as Recon3DMind in our conference work, is of significant
interest to both cognitive neuroscience and computer vision. To advance this
task, we present the fMRI-3D dataset, which includes data from 15 participants
and showcases a total of 4768 3D objects. The dataset comprises two components:
fMRI-Shape, previously introduced and accessible at
https://huggingface.co/datasets/Fudan-fMRI/fMRI-Shape, and fMRI-Objaverse,
proposed in this paper and available at
https://huggingface.co/datasets/Fudan-fMRI/fMRI-Objaverse. fMRI-Objaverse
includes data from 5 subjects, 4 of whom are also part of the Core set in
fMRI-Shape, with each subject viewing 3142 3D objects across 117 categories,
all accompanied by text captions. This significantly enhances the diversity and
potential applications of the dataset. Additionally, we propose MinD-3D, a
novel framework designed to decode 3D visual information from fMRI signals. The
framework first extracts and aggregates features from fMRI data using a
neuro-fusion encoder, then employs a feature-bridge diffusion model to generate
visual features, and finally reconstructs the 3D object using a generative
transformer decoder. We establish new benchmarks by designing metrics at both
semantic and structural levels to evaluate model performance. Furthermore, we
assess our model's effectiveness in an Out-of-Distribution setting and analyze
the attribution of the extracted features and the visual ROIs in fMRI signals.
Our experiments demonstrate that MinD-3D not only reconstructs 3D objects with
high semantic and spatial accuracy but also deepens our understanding of how
human brain processes 3D visual information. Project page at:
https://jianxgao.github.io/MinD-3D.