Xiaomei Zhang, Min Deng, Jiwei Hu, Xiao Huang, Qiwen Jin
{"title":"用于透明对象深度估计的双相位细化学习","authors":"Xiaomei Zhang, Min Deng, Jiwei Hu, Xiao Huang, Qiwen Jin","doi":"10.1016/j.eswa.2025.130043","DOIUrl":null,"url":null,"abstract":"<div><div>Transparent object depth estimation is a critical yet challenging task in robotic perception, particularly in grasping applications for industrial automation and human-robot interaction. Due to the high transmittance of visible light in transparent materials, depth sensors often suffer from severe depth measurement errors, leading to inaccuracies in grasp planning and object manipulation. To address this issue, we propose a Mamba-Transformer hybrid encoding framework (CrysFormer++) for robust depth estimation of transparent objects. The model integrates VMamba to efficiently model global long-range dependencies and leverages Swin Transformer to capture fine-grained local features. In addition, we have developed a self-supervised confidence learning framework that generates pixel-wise reliability maps through photometric consistency constraints, and realizes adaptive fusion of raw depth measurements and network predictions via physics-informed spatial weighting. Meanwhile, we have designed a novel loss function to enhance the accuracy and robustness of depth prediction. Extensive experiments conducted on the TransCG and ClearGrasp datasets validate that CrysFormer++ achieves superior performance compared to existing state-of-the-art approaches, in terms of both visual quality and quantitative metrics. The results validate the effectiveness of CrysFormer++ in handling complex backgrounds, providing a high-precision depth perception solution for robotic grasping of transparent objects.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"299 ","pages":"Article 130043"},"PeriodicalIF":7.5000,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CrysFormer++: Dual-phase refinement learning for transparent object depth estimation\",\"authors\":\"Xiaomei Zhang, Min Deng, Jiwei Hu, Xiao Huang, Qiwen Jin\",\"doi\":\"10.1016/j.eswa.2025.130043\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Transparent object depth estimation is a critical yet challenging task in robotic perception, particularly in grasping applications for industrial automation and human-robot interaction. Due to the high transmittance of visible light in transparent materials, depth sensors often suffer from severe depth measurement errors, leading to inaccuracies in grasp planning and object manipulation. To address this issue, we propose a Mamba-Transformer hybrid encoding framework (CrysFormer++) for robust depth estimation of transparent objects. The model integrates VMamba to efficiently model global long-range dependencies and leverages Swin Transformer to capture fine-grained local features. In addition, we have developed a self-supervised confidence learning framework that generates pixel-wise reliability maps through photometric consistency constraints, and realizes adaptive fusion of raw depth measurements and network predictions via physics-informed spatial weighting. Meanwhile, we have designed a novel loss function to enhance the accuracy and robustness of depth prediction. Extensive experiments conducted on the TransCG and ClearGrasp datasets validate that CrysFormer++ achieves superior performance compared to existing state-of-the-art approaches, in terms of both visual quality and quantitative metrics. The results validate the effectiveness of CrysFormer++ in handling complex backgrounds, providing a high-precision depth perception solution for robotic grasping of transparent objects.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"299 \",\"pages\":\"Article 130043\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425036590\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425036590","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
CrysFormer++: Dual-phase refinement learning for transparent object depth estimation
Transparent object depth estimation is a critical yet challenging task in robotic perception, particularly in grasping applications for industrial automation and human-robot interaction. Due to the high transmittance of visible light in transparent materials, depth sensors often suffer from severe depth measurement errors, leading to inaccuracies in grasp planning and object manipulation. To address this issue, we propose a Mamba-Transformer hybrid encoding framework (CrysFormer++) for robust depth estimation of transparent objects. The model integrates VMamba to efficiently model global long-range dependencies and leverages Swin Transformer to capture fine-grained local features. In addition, we have developed a self-supervised confidence learning framework that generates pixel-wise reliability maps through photometric consistency constraints, and realizes adaptive fusion of raw depth measurements and network predictions via physics-informed spatial weighting. Meanwhile, we have designed a novel loss function to enhance the accuracy and robustness of depth prediction. Extensive experiments conducted on the TransCG and ClearGrasp datasets validate that CrysFormer++ achieves superior performance compared to existing state-of-the-art approaches, in terms of both visual quality and quantitative metrics. The results validate the effectiveness of CrysFormer++ in handling complex backgrounds, providing a high-precision depth perception solution for robotic grasping of transparent objects.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.