DeepArUco++：在极具挑战性的照明条件下改进方形靶标的检测

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2024-10-30 DOI:10.1016/j.imavis.2024.105313

Rafael Berral-Soler , Rafael Muñoz-Salinas , Rafael Medina-Carnicer , Manuel J. Marín-Jiménez

{"title":"DeepArUco++：在极具挑战性的照明条件下改进方形靶标的检测","authors":"Rafael Berral-Soler , Rafael Muñoz-Salinas , Rafael Medina-Carnicer , Manuel J. Marín-Jiménez","doi":"10.1016/j.imavis.2024.105313","DOIUrl":null,"url":null,"abstract":"<div><div>Fiducial markers are a computer vision tool used for object pose estimation and detection. These markers are highly useful in fields such as industry, medicine and logistics. However, optimal lighting conditions are not always available, and other factors such as blur or sensor noise can affect image quality. Classical computer vision techniques that precisely locate and decode fiducial markers often fail under difficult illumination conditions (e.g. extreme variations of lighting within the same frame). Hence, we propose DeepArUco++, a deep learning-based framework that leverages the robustness of Convolutional Neural Networks to perform marker detection and decoding in challenging lighting conditions. The framework is based on a pipeline using different Neural Network models at each step, namely marker detection, corner refinement and marker decoding. Additionally, we propose a simple method for generating synthetic data for training the different models that compose the proposed pipeline, and we present a second, real-life dataset of ArUco markers in challenging lighting conditions used to evaluate our system. The developed method outperforms other state-of-the-art methods in such tasks and remains competitive even when testing on the datasets used to develop those methods. Code available in GitHub: <span><span>https://github.com/AVAuco/deeparuco/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105313"},"PeriodicalIF":4.2000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions\",\"authors\":\"Rafael Berral-Soler , Rafael Muñoz-Salinas , Rafael Medina-Carnicer , Manuel J. Marín-Jiménez\",\"doi\":\"10.1016/j.imavis.2024.105313\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Fiducial markers are a computer vision tool used for object pose estimation and detection. These markers are highly useful in fields such as industry, medicine and logistics. However, optimal lighting conditions are not always available, and other factors such as blur or sensor noise can affect image quality. Classical computer vision techniques that precisely locate and decode fiducial markers often fail under difficult illumination conditions (e.g. extreme variations of lighting within the same frame). Hence, we propose DeepArUco++, a deep learning-based framework that leverages the robustness of Convolutional Neural Networks to perform marker detection and decoding in challenging lighting conditions. The framework is based on a pipeline using different Neural Network models at each step, namely marker detection, corner refinement and marker decoding. Additionally, we propose a simple method for generating synthetic data for training the different models that compose the proposed pipeline, and we present a second, real-life dataset of ArUco markers in challenging lighting conditions used to evaluate our system. The developed method outperforms other state-of-the-art methods in such tasks and remains competitive even when testing on the datasets used to develop those methods. Code available in GitHub: <span><span>https://github.com/AVAuco/deeparuco/</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"152 \",\"pages\":\"Article 105313\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624004189\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004189","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

靶标是一种计算机视觉工具，用于物体姿态估计和检测。这些标记在工业、医疗和物流等领域非常有用。然而，最佳照明条件并不总是存在，模糊或传感器噪声等其他因素也会影响图像质量。精确定位和解码靶标的经典计算机视觉技术在光照条件恶劣的情况下（如同一帧内光照变化极大）往往会失效。因此，我们提出了 DeepArUco++，这是一种基于深度学习的框架，利用卷积神经网络的鲁棒性，在具有挑战性的光照条件下执行标记检测和解码。该框架基于一个流水线，每个步骤都使用不同的神经网络模型，即标记检测、角细化和标记解码。此外，我们还提出了一种生成合成数据的简单方法，用于训练构成拟议流水线的不同模型，我们还展示了在具有挑战性的照明条件下 ArUco 标记的第二个真实数据集，用于评估我们的系统。所开发的方法在此类任务中的表现优于其他最先进的方法，即使在用于开发这些方法的数据集上进行测试，也能保持竞争力。代码见 GitHub：https://github.com/AVAuco/deeparuco/。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions

查看原文本刊更多论文

DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions

Fiducial markers are a computer vision tool used for object pose estimation and detection. These markers are highly useful in fields such as industry, medicine and logistics. However, optimal lighting conditions are not always available, and other factors such as blur or sensor noise can affect image quality. Classical computer vision techniques that precisely locate and decode fiducial markers often fail under difficult illumination conditions (e.g. extreme variations of lighting within the same frame). Hence, we propose DeepArUco++, a deep learning-based framework that leverages the robustness of Convolutional Neural Networks to perform marker detection and decoding in challenging lighting conditions. The framework is based on a pipeline using different Neural Network models at each step, namely marker detection, corner refinement and marker decoding. Additionally, we propose a simple method for generating synthetic data for training the different models that compose the proposed pipeline, and we present a second, real-life dataset of ArUco markers in challenging lighting conditions used to evaluate our system. The developed method outperforms other state-of-the-art methods in such tasks and remains competitive even when testing on the datasets used to develop those methods. Code available in GitHub: https://github.com/AVAuco/deeparuco/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.