A Diffusion-Based Data Generator for Training Object Recognition Models in Ultra-Range Distance

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2024-11-14 DOI:10.1109/LRA.2024.3498774

Eran Bamani;Eden Nissinman;Lisa Koenigsberg;Inbar Meir;Avishai Sintov

{"title":"A Diffusion-Based Data Generator for Training Object Recognition Models in Ultra-Range Distance","authors":"Eran Bamani;Eden Nissinman;Lisa Koenigsberg;Inbar Meir;Avishai Sintov","doi":"10.1109/LRA.2024.3498774","DOIUrl":null,"url":null,"abstract":"Object recognition, commonly performed by a camera, is a fundamental requirement for robots to complete complex tasks. Some tasks require recognizing objects far from the robot's camera. A challenging example is Ultra-Range Gesture Recognition (URGR) in human-robot interaction where the user exhibits directive gestures at a distance of up to 25 m from the robot. However, training a model to recognize hardly visible objects located in ultra-range requires an exhaustive collection of a significant amount of labeled samples. The generation of synthetic training datasets is a recent solution to the lack of real-world data, while unable to properly replicate the realistic visual characteristics of distant objects in images. In this letter, we propose the Diffusion in Ultra-Range (DUR) framework based on a Diffusion model to generate labeled images of distant objects in various scenes. The DUR generator receives a desired distance and class (e.g., gesture) and outputs a corresponding synthetic image. We apply DUR to train a URGR model with directive gestures in which fine details of the gesturing hand are challenging to distinguish. DUR is compared to other types of generative models showcasing superiority both in fidelity and in recognition success rate when training a URGR model. More importantly, training a DUR model on a limited amount of real data and then using it to generate synthetic data for training a URGR model outperforms directly training the URGR model on real data. The synthetic-based URGR model is also demonstrated in gesture-based direction of a ground robot.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11722-11729"},"PeriodicalIF":4.6000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10753005/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Object recognition, commonly performed by a camera, is a fundamental requirement for robots to complete complex tasks. Some tasks require recognizing objects far from the robot's camera. A challenging example is Ultra-Range Gesture Recognition (URGR) in human-robot interaction where the user exhibits directive gestures at a distance of up to 25 m from the robot. However, training a model to recognize hardly visible objects located in ultra-range requires an exhaustive collection of a significant amount of labeled samples. The generation of synthetic training datasets is a recent solution to the lack of real-world data, while unable to properly replicate the realistic visual characteristics of distant objects in images. In this letter, we propose the Diffusion in Ultra-Range (DUR) framework based on a Diffusion model to generate labeled images of distant objects in various scenes. The DUR generator receives a desired distance and class (e.g., gesture) and outputs a corresponding synthetic image. We apply DUR to train a URGR model with directive gestures in which fine details of the gesturing hand are challenging to distinguish. DUR is compared to other types of generative models showcasing superiority both in fidelity and in recognition success rate when training a URGR model. More importantly, training a DUR model on a limited amount of real data and then using it to generate synthetic data for training a URGR model outperforms directly training the URGR model on real data. The synthetic-based URGR model is also demonstrated in gesture-based direction of a ground robot.

查看原文本刊更多论文

基于扩散的数据生成器，用于训练超远距离物体识别模型

物体识别通常由摄像头完成，是机器人完成复杂任务的基本要求。有些任务需要识别远离机器人摄像头的物体。一个具有挑战性的例子是人机交互中的超远距离手势识别（URGR），用户在距离机器人 25 米远的地方做出指令性手势。然而，要训练一个模型来识别超远距离内几乎不可见的物体，需要详尽地收集大量标注样本。合成训练数据集的生成是近年来解决真实世界数据缺乏问题的一种方法，但这种方法无法正确复制图像中远处物体的真实视觉特征。在这封信中，我们提出了基于扩散模型的超视距扩散（DUR）框架，用于生成各种场景中远处物体的标记图像。DUR 生成器接收所需的距离和类别（如手势），并输出相应的合成图像。我们将 DUR 应用于训练 URGR 模型，该模型具有指令性手势，其中手势的细节难以区分。我们将 DUR 与其他类型的生成模型进行了比较，结果表明，在训练 URGR 模型时，DUR 在逼真度和识别成功率方面都更胜一筹。更重要的是，在有限的真实数据上训练 DUR 模型，然后用它生成用于训练 URGR 模型的合成数据，其效果优于直接在真实数据上训练 URGR 模型。基于合成数据的 URGR 模型还在地面机器人的手势导航中得到了验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.