Han Hu, Liupeng Su, Shunfu Mao, Min Chen, Guoqiang Pan, Bo Xu, Qing Zhu
{"title":"基于可变形卷积网络的多视点立体匹配自适应区域聚合","authors":"Han Hu, Liupeng Su, Shunfu Mao, Min Chen, Guoqiang Pan, Bo Xu, Qing Zhu","doi":"10.1111/phor.12459","DOIUrl":null,"url":null,"abstract":"Deep‐learning methods have demonstrated promising performance in multi‐view stereo (MVS) applications. However, it remains challenging to apply a geometrical prior on the adaptive matching windows to achieve efficient three‐dimensional reconstruction. To address this problem, this paper proposes a learnable adaptive region aggregation method based on deformable convolutional networks (DCNs), which is integrated into the feature extraction workflow for MVSNet method that uses coarse‐to‐fine structure. Following the conventional pipeline of MVSNet, a DCN is used to densely estimate and apply transformations in our feature extractor, which is composed of a deformable feature pyramid network (DFPN). Furthermore, we introduce a dedicated offset regulariser to promote the convergence of the learnable offsets of the DCN. The effectiveness of the proposed DFPN is validated through quantitative and qualitative evaluations on the BlendedMVS and Tanks and Temples benchmark datasets within a cross‐dataset evaluation setting.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"16 1","pages":"430 - 449"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive region aggregation for multi‐view stereo matching using deformable convolutional networks\",\"authors\":\"Han Hu, Liupeng Su, Shunfu Mao, Min Chen, Guoqiang Pan, Bo Xu, Qing Zhu\",\"doi\":\"10.1111/phor.12459\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep‐learning methods have demonstrated promising performance in multi‐view stereo (MVS) applications. However, it remains challenging to apply a geometrical prior on the adaptive matching windows to achieve efficient three‐dimensional reconstruction. To address this problem, this paper proposes a learnable adaptive region aggregation method based on deformable convolutional networks (DCNs), which is integrated into the feature extraction workflow for MVSNet method that uses coarse‐to‐fine structure. Following the conventional pipeline of MVSNet, a DCN is used to densely estimate and apply transformations in our feature extractor, which is composed of a deformable feature pyramid network (DFPN). Furthermore, we introduce a dedicated offset regulariser to promote the convergence of the learnable offsets of the DCN. The effectiveness of the proposed DFPN is validated through quantitative and qualitative evaluations on the BlendedMVS and Tanks and Temples benchmark datasets within a cross‐dataset evaluation setting.\",\"PeriodicalId\":22881,\"journal\":{\"name\":\"The Photogrammetric Record\",\"volume\":\"16 1\",\"pages\":\"430 - 449\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Photogrammetric Record\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1111/phor.12459\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Photogrammetric Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/phor.12459","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
深度学习方法在多视点立体(MVS)应用中表现出了良好的性能。然而,如何在自适应匹配窗口上应用几何先验来实现有效的三维重建仍然是一个挑战。为了解决这一问题,本文提出了一种基于可变形卷积网络(DCNs)的可学习自适应区域聚合方法,并将其集成到使用粗-细结构的MVSNet方法的特征提取工作流程中。在MVSNet传统管道的基础上,利用DCN对可变形特征金字塔网络(DFPN)构成的特征提取器进行密集估计和变换。此外,我们还引入了一个专用的偏移校正器来促进DCN的可学习偏移的收敛性。通过在交叉数据集评估设置中对BlendedMVS和Tanks and Temples基准数据集进行定量和定性评估,验证了所提出的DFPN的有效性。
Adaptive region aggregation for multi‐view stereo matching using deformable convolutional networks
Deep‐learning methods have demonstrated promising performance in multi‐view stereo (MVS) applications. However, it remains challenging to apply a geometrical prior on the adaptive matching windows to achieve efficient three‐dimensional reconstruction. To address this problem, this paper proposes a learnable adaptive region aggregation method based on deformable convolutional networks (DCNs), which is integrated into the feature extraction workflow for MVSNet method that uses coarse‐to‐fine structure. Following the conventional pipeline of MVSNet, a DCN is used to densely estimate and apply transformations in our feature extractor, which is composed of a deformable feature pyramid network (DFPN). Furthermore, we introduce a dedicated offset regulariser to promote the convergence of the learnable offsets of the DCN. The effectiveness of the proposed DFPN is validated through quantitative and qualitative evaluations on the BlendedMVS and Tanks and Temples benchmark datasets within a cross‐dataset evaluation setting.