Hang Gao, Xinming Wu, Luming Liang, Hanlin Sheng, Xu Si, Gao Hui, Yaxing Li
{"title":"由多模式提示引擎驱动的地基模型,用于跨勘探的通用地震地质体解释","authors":"Hang Gao, Xinming Wu, Luming Liang, Hanlin Sheng, Xu Si, Gao Hui, Yaxing Li","doi":"arxiv-2409.04962","DOIUrl":null,"url":null,"abstract":"Seismic geobody interpretation is crucial for structural geology studies and\nvarious engineering applications. Existing deep learning methods show promise\nbut lack support for multi-modal inputs and struggle to generalize to different\ngeobody types or surveys. We introduce a promptable foundation model for\ninterpreting any geobodies across seismic surveys. This model integrates a\npre-trained vision foundation model (VFM) with a sophisticated multi-modal\nprompt engine. The VFM, pre-trained on massive natural images and fine-tuned on\nseismic data, provides robust feature extraction for cross-survey\ngeneralization. The prompt engine incorporates multi-modal prior information to\niteratively refine geobody delineation. Extensive experiments demonstrate the\nmodel's superior accuracy, scalability from 2D to 3D, and generalizability to\nvarious geobody types, including those unseen during training. To our\nknowledge, this is the first highly scalable and versatile multi-modal\nfoundation model capable of interpreting any geobodies across surveys while\nsupporting real-time interactions. Our approach establishes a new paradigm for\ngeoscientific data interpretation, with broad potential for transfer to other\ntasks.","PeriodicalId":501270,"journal":{"name":"arXiv - PHYS - Geophysics","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A foundation model enpowered by a multi-modal prompt engine for universal seismic geobody interpretation across surveys\",\"authors\":\"Hang Gao, Xinming Wu, Luming Liang, Hanlin Sheng, Xu Si, Gao Hui, Yaxing Li\",\"doi\":\"arxiv-2409.04962\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Seismic geobody interpretation is crucial for structural geology studies and\\nvarious engineering applications. Existing deep learning methods show promise\\nbut lack support for multi-modal inputs and struggle to generalize to different\\ngeobody types or surveys. We introduce a promptable foundation model for\\ninterpreting any geobodies across seismic surveys. This model integrates a\\npre-trained vision foundation model (VFM) with a sophisticated multi-modal\\nprompt engine. The VFM, pre-trained on massive natural images and fine-tuned on\\nseismic data, provides robust feature extraction for cross-survey\\ngeneralization. The prompt engine incorporates multi-modal prior information to\\niteratively refine geobody delineation. Extensive experiments demonstrate the\\nmodel's superior accuracy, scalability from 2D to 3D, and generalizability to\\nvarious geobody types, including those unseen during training. To our\\nknowledge, this is the first highly scalable and versatile multi-modal\\nfoundation model capable of interpreting any geobodies across surveys while\\nsupporting real-time interactions. Our approach establishes a new paradigm for\\ngeoscientific data interpretation, with broad potential for transfer to other\\ntasks.\",\"PeriodicalId\":501270,\"journal\":{\"name\":\"arXiv - PHYS - Geophysics\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Geophysics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.04962\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Geophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A foundation model enpowered by a multi-modal prompt engine for universal seismic geobody interpretation across surveys
Seismic geobody interpretation is crucial for structural geology studies and
various engineering applications. Existing deep learning methods show promise
but lack support for multi-modal inputs and struggle to generalize to different
geobody types or surveys. We introduce a promptable foundation model for
interpreting any geobodies across seismic surveys. This model integrates a
pre-trained vision foundation model (VFM) with a sophisticated multi-modal
prompt engine. The VFM, pre-trained on massive natural images and fine-tuned on
seismic data, provides robust feature extraction for cross-survey
generalization. The prompt engine incorporates multi-modal prior information to
iteratively refine geobody delineation. Extensive experiments demonstrate the
model's superior accuracy, scalability from 2D to 3D, and generalizability to
various geobody types, including those unseen during training. To our
knowledge, this is the first highly scalable and versatile multi-modal
foundation model capable of interpreting any geobodies across surveys while
supporting real-time interactions. Our approach establishes a new paradigm for
geoscientific data interpretation, with broad potential for transfer to other
tasks.