PS-Transformer:使用自注意机制学习稀疏光度立体网络

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-11-21 DOI:10.48550/arXiv.2211.11386

Satoshi Ikehata

{"title":"PS-Transformer:使用自注意机制学习稀疏光度立体网络","authors":"Satoshi Ikehata","doi":"10.48550/arXiv.2211.11386","DOIUrl":null,"url":null,"abstract":"Existing deep calibrated photometric stereo networks basically aggregate observations under different lights based on the pre-defined operations such as linear projection and max pooling. While they are effective with the dense capture, simple first-order operations often fail to capture the high-order interactions among observations under small number of different lights. To tackle this issue, this paper presents a deep sparse calibrated photometric stereo network named {\\it PS-Transformer} which leverages the learnable self-attention mechanism to properly capture the complex inter-image interactions. PS-Transformer builds upon the dual-branch design to explore both pixel-wise and image-wise features and individual feature is trained with the intermediate surface normal supervision to maximize geometric feasibility. A new synthetic dataset named CyclesPS+ is also presented with the comprehensive analysis to successfully train the photometric stereo networks. Extensive results on the publicly available benchmark datasets demonstrate that the surface normal prediction accuracy of the proposed method significantly outperforms other state-of-the-art algorithms with the same number of input images and is even comparable to that of dense algorithms which input 10$\\times$ larger number of images.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"11 1","pages":"30"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"PS-Transformer: Learning Sparse Photometric Stereo Network using Self-Attention Mechanism\",\"authors\":\"Satoshi Ikehata\",\"doi\":\"10.48550/arXiv.2211.11386\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing deep calibrated photometric stereo networks basically aggregate observations under different lights based on the pre-defined operations such as linear projection and max pooling. While they are effective with the dense capture, simple first-order operations often fail to capture the high-order interactions among observations under small number of different lights. To tackle this issue, this paper presents a deep sparse calibrated photometric stereo network named {\\\\it PS-Transformer} which leverages the learnable self-attention mechanism to properly capture the complex inter-image interactions. PS-Transformer builds upon the dual-branch design to explore both pixel-wise and image-wise features and individual feature is trained with the intermediate surface normal supervision to maximize geometric feasibility. A new synthetic dataset named CyclesPS+ is also presented with the comprehensive analysis to successfully train the photometric stereo networks. Extensive results on the publicly available benchmark datasets demonstrate that the surface normal prediction accuracy of the proposed method significantly outperforms other state-of-the-art algorithms with the same number of input images and is even comparable to that of dense algorithms which input 10$\\\\times$ larger number of images.\",\"PeriodicalId\":72437,\"journal\":{\"name\":\"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference\",\"volume\":\"11 1\",\"pages\":\"30\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2211.11386\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.11386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

现有的深度标定光度立体网络基本上是基于线性投影和最大池化等预定义操作，对不同光照下的观测数据进行聚合。虽然它们对于密集捕获是有效的，但简单的一阶操作通常无法捕获在少量不同光线下观测值之间的高阶相互作用。为了解决这个问题，本文提出了一个名为{\it PS-Transformer}的深度稀疏校准光度立体网络，该网络利用可学习的自注意机制来正确捕获复杂的图像间交互。PS-Transformer基于双分支设计来探索像素和图像特征，并且使用中间表面法线监督来训练单个特征，以最大化几何可行性。本文还提出了一个新的合成数据集CyclesPS+，并进行了综合分析，成功地训练了光度立体网络。在公开可用的基准数据集上的大量结果表明，该方法的表面法向预测精度显著优于其他具有相同输入图像数量的最先进算法，甚至可以与输入图像数量大10倍的密集算法相媲美。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PS-Transformer: Learning Sparse Photometric Stereo Network using Self-Attention Mechanism

Existing deep calibrated photometric stereo networks basically aggregate observations under different lights based on the pre-defined operations such as linear projection and max pooling. While they are effective with the dense capture, simple first-order operations often fail to capture the high-order interactions among observations under small number of different lights. To tackle this issue, this paper presents a deep sparse calibrated photometric stereo network named {\it PS-Transformer} which leverages the learnable self-attention mechanism to properly capture the complex inter-image interactions. PS-Transformer builds upon the dual-branch design to explore both pixel-wise and image-wise features and individual feature is trained with the intermediate surface normal supervision to maximize geometric feasibility. A new synthetic dataset named CyclesPS+ is also presented with the comprehensive analysis to successfully train the photometric stereo networks. Extensive results on the publicly available benchmark datasets demonstrate that the surface normal prediction accuracy of the proposed method significantly outperforms other state-of-the-art algorithms with the same number of input images and is even comparable to that of dense algorithms which input 10$\times$ larger number of images.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference

自引率

0.00%

发文量