CMFormer: Non-line-of-sight imaging with a memory-efficient MetaFormer network

IF 3.5 2区 工程技术 Q2 OPTICS
Shihao Zhang , Shaohui Jin , Hao Liu , Yue Li , Xiaoheng Jiang , Mingliang Xu
{"title":"CMFormer: Non-line-of-sight imaging with a memory-efficient MetaFormer network","authors":"Shihao Zhang ,&nbsp;Shaohui Jin ,&nbsp;Hao Liu ,&nbsp;Yue Li ,&nbsp;Xiaoheng Jiang ,&nbsp;Mingliang Xu","doi":"10.1016/j.optlaseng.2025.108875","DOIUrl":null,"url":null,"abstract":"<div><div>Non-line-of-sight (NLOS) imaging aims to overcome the limitation of traditional sensors that can only detect targets within the line of sight. While existing NLOS imaging algorithms have achieved notable imaging quality, they are constrained by significant memory requirements due to the 3D nature of transient measurements. In this paper, we propose a new memory-efficient MetaFormer-based NLOS imaging method, named CMFormer, which enables NLOS imaging with lower memory usage and faster imaging speed, facilitating deployment on consumer-grade GPUs. Specifically, we design a lightweight module based on MetaFormer, which employs multi-dimensional global convolution and multi-scale dilated convolution as token mixers. This approach leverages the strong temporal-spatial correlation more effectively without separating the transient data into distinct temporal and spatial components for feature extraction. With the unique characteristics of this token mixer, we propose aggregate feature transmission to replace conventional skip connections, achieving better performance without needing to increase network width at the decoder stage. Additionally, to mitigate the loss of important detail features during downsampling, we design a cross-layer integration attention module to enhance the interaction between the adjacent hierarchical features. Leveraging gradient checkpointing technology, the proposed method can be easily trained and inferred on consumer-grade GPUs, significantly less than the current best imaging algorithm NLOST, and achieves an imaging speed of 8 FPS. We employ the UNet hierarchical structure to build our pipeline, ensuring that our network can better denoise and enhance generalization to real-world scenarios even when trained on synthetic datasets. Extensive experimental results demonstrate that our method achieves the best performance on both synthetic and real-world data with low memory cost and higher imaging speed. The code will be released soon.</div></div>","PeriodicalId":49719,"journal":{"name":"Optics and Lasers in Engineering","volume":"187 ","pages":"Article 108875"},"PeriodicalIF":3.5000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optics and Lasers in Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0143816625000624","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Non-line-of-sight (NLOS) imaging aims to overcome the limitation of traditional sensors that can only detect targets within the line of sight. While existing NLOS imaging algorithms have achieved notable imaging quality, they are constrained by significant memory requirements due to the 3D nature of transient measurements. In this paper, we propose a new memory-efficient MetaFormer-based NLOS imaging method, named CMFormer, which enables NLOS imaging with lower memory usage and faster imaging speed, facilitating deployment on consumer-grade GPUs. Specifically, we design a lightweight module based on MetaFormer, which employs multi-dimensional global convolution and multi-scale dilated convolution as token mixers. This approach leverages the strong temporal-spatial correlation more effectively without separating the transient data into distinct temporal and spatial components for feature extraction. With the unique characteristics of this token mixer, we propose aggregate feature transmission to replace conventional skip connections, achieving better performance without needing to increase network width at the decoder stage. Additionally, to mitigate the loss of important detail features during downsampling, we design a cross-layer integration attention module to enhance the interaction between the adjacent hierarchical features. Leveraging gradient checkpointing technology, the proposed method can be easily trained and inferred on consumer-grade GPUs, significantly less than the current best imaging algorithm NLOST, and achieves an imaging speed of 8 FPS. We employ the UNet hierarchical structure to build our pipeline, ensuring that our network can better denoise and enhance generalization to real-world scenarios even when trained on synthetic datasets. Extensive experimental results demonstrate that our method achieves the best performance on both synthetic and real-world data with low memory cost and higher imaging speed. The code will be released soon.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Optics and Lasers in Engineering
Optics and Lasers in Engineering 工程技术-光学
CiteScore
8.90
自引率
8.70%
发文量
384
审稿时长
42 days
期刊介绍: Optics and Lasers in Engineering aims at providing an international forum for the interchange of information on the development of optical techniques and laser technology in engineering. Emphasis is placed on contributions targeted at the practical use of methods and devices, the development and enhancement of solutions and new theoretical concepts for experimental methods. Optics and Lasers in Engineering reflects the main areas in which optical methods are being used and developed for an engineering environment. Manuscripts should offer clear evidence of novelty and significance. Papers focusing on parameter optimization or computational issues are not suitable. Similarly, papers focussed on an application rather than the optical method fall outside the journal''s scope. The scope of the journal is defined to include the following: -Optical Metrology- Optical Methods for 3D visualization and virtual engineering- Optical Techniques for Microsystems- Imaging, Microscopy and Adaptive Optics- Computational Imaging- Laser methods in manufacturing- Integrated optical and photonic sensors- Optics and Photonics in Life Science- Hyperspectral and spectroscopic methods- Infrared and Terahertz techniques
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信