{"title":"基于LoFTR和MAGSAC++的深度学习不同视点图像匹配","authors":"Liang Tian","doi":"10.1145/3582177.3582181","DOIUrl":null,"url":null,"abstract":"Matching 2D images from different viewpoints plays a crucial role in the fields of Structure-from-Motion and 3D reconstruction. However, image matching for assorted and unstructured images with a wide variety of viewpoints leads to difficulty for traditional matching methods. In this paper, we propose a Transformer-based feature matching approach to capture the same physical points of a scene from two images with different viewpoints. The local features of images are extracted by the LoFTR, which is a detector-free deep-learning matching model on the basis of Transformer. The subsequent matching process is realized by the MAGSAC++ estimator, where the matching results are summarized in the fundamental matrix as the model output. By removing image feature points with low confidence scores and applying the test time augmentation, our approach can reach a mean Average Accuracy 0.81340 in the Kaggle competition Image Matching Challenge 2022. This score ranks 45/642 in the competition leaderboard, and can get a silver medal in this competition. Our work could help accelerate the research of generalized methods for Structure-from-Motion and 3D reconstruction, and would potentially deepen the understanding of image feature matching and related fields.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Matching Images from Different Viewpoints with Deep Learning Based on LoFTR and MAGSAC++\",\"authors\":\"Liang Tian\",\"doi\":\"10.1145/3582177.3582181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Matching 2D images from different viewpoints plays a crucial role in the fields of Structure-from-Motion and 3D reconstruction. However, image matching for assorted and unstructured images with a wide variety of viewpoints leads to difficulty for traditional matching methods. In this paper, we propose a Transformer-based feature matching approach to capture the same physical points of a scene from two images with different viewpoints. The local features of images are extracted by the LoFTR, which is a detector-free deep-learning matching model on the basis of Transformer. The subsequent matching process is realized by the MAGSAC++ estimator, where the matching results are summarized in the fundamental matrix as the model output. By removing image feature points with low confidence scores and applying the test time augmentation, our approach can reach a mean Average Accuracy 0.81340 in the Kaggle competition Image Matching Challenge 2022. This score ranks 45/642 in the competition leaderboard, and can get a silver medal in this competition. Our work could help accelerate the research of generalized methods for Structure-from-Motion and 3D reconstruction, and would potentially deepen the understanding of image feature matching and related fields.\",\"PeriodicalId\":170327,\"journal\":{\"name\":\"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3582177.3582181\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582177.3582181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Matching Images from Different Viewpoints with Deep Learning Based on LoFTR and MAGSAC++
Matching 2D images from different viewpoints plays a crucial role in the fields of Structure-from-Motion and 3D reconstruction. However, image matching for assorted and unstructured images with a wide variety of viewpoints leads to difficulty for traditional matching methods. In this paper, we propose a Transformer-based feature matching approach to capture the same physical points of a scene from two images with different viewpoints. The local features of images are extracted by the LoFTR, which is a detector-free deep-learning matching model on the basis of Transformer. The subsequent matching process is realized by the MAGSAC++ estimator, where the matching results are summarized in the fundamental matrix as the model output. By removing image feature points with low confidence scores and applying the test time augmentation, our approach can reach a mean Average Accuracy 0.81340 in the Kaggle competition Image Matching Challenge 2022. This score ranks 45/642 in the competition leaderboard, and can get a silver medal in this competition. Our work could help accelerate the research of generalized methods for Structure-from-Motion and 3D reconstruction, and would potentially deepen the understanding of image feature matching and related fields.