{"title":"Structured Light Image Planar-Topography Feature Decomposition for Generalizable 3D Shape Measurement","authors":"Mingyang Lei;Jingfan Fan;Long Shao;Hong Song;Deqiang Xiao;Danni Ai;Tianyu Fu;Yucong Lin;Ying Gu;Jian Yang","doi":"10.1109/TCSVT.2025.3558732","DOIUrl":null,"url":null,"abstract":"The application of structured light (SL) techniques has achieved remarkable success in three-dimensional (3D) measurements. Traditional methods generally calculate SL information pixel by pixel to obtain the measurement results. Recently, the rise of deep learning (DL) has led to significant developments in this task. However, existing DL-based methods generally learn all features within the image in an end-to-end manner, ignoring the distinction between SL and non-SL information. Therefore, these methods may encounter difficulties in focusing on subtle variations in SL patterns across different scenes, thereby degrading measurement precision. To overcome this challenge, we propose a novel SL Image Planar-Topography Feature Decomposition Network (SIDNet). To fully utilize the information from different SL modality images (fringe and speckle), we decompose different modalities into topography features (modality-specific) and planar features (modality-shared). A physics-driven decomposition loss is proposed to make the topography/planar features dissimilar/similar, which guides the network to distinguish between SL and non-SL information. Moreover, to obtain modality-fused features with global overview and local detail information, we propose a wrapped phase-driven feature fusion module. Specifically, a novel Tri-modality Mamba block is designed to integrate different sources with the guidance of the wrapped phase features. Extensive experiments demonstrate the superiority of our SIDNet in multiple simulated 3D measurement scenes. Moreover, our method shows better generalization ability than other DL models and can be directly applicable to unseen real-world scenes.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9517-9529"},"PeriodicalIF":11.1000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10955736/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The application of structured light (SL) techniques has achieved remarkable success in three-dimensional (3D) measurements. Traditional methods generally calculate SL information pixel by pixel to obtain the measurement results. Recently, the rise of deep learning (DL) has led to significant developments in this task. However, existing DL-based methods generally learn all features within the image in an end-to-end manner, ignoring the distinction between SL and non-SL information. Therefore, these methods may encounter difficulties in focusing on subtle variations in SL patterns across different scenes, thereby degrading measurement precision. To overcome this challenge, we propose a novel SL Image Planar-Topography Feature Decomposition Network (SIDNet). To fully utilize the information from different SL modality images (fringe and speckle), we decompose different modalities into topography features (modality-specific) and planar features (modality-shared). A physics-driven decomposition loss is proposed to make the topography/planar features dissimilar/similar, which guides the network to distinguish between SL and non-SL information. Moreover, to obtain modality-fused features with global overview and local detail information, we propose a wrapped phase-driven feature fusion module. Specifically, a novel Tri-modality Mamba block is designed to integrate different sources with the guidance of the wrapped phase features. Extensive experiments demonstrate the superiority of our SIDNet in multiple simulated 3D measurement scenes. Moreover, our method shows better generalization ability than other DL models and can be directly applicable to unseen real-world scenes.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.