Optimal Input Selection for Single Object Tracking using RGB-Thermal Camera

2022 IEEE 12th Symposium on Computer Applications & Industrial Electronics (ISCAIE) Pub Date : 2022-05-21 DOI:10.1109/iscaie54458.2022.9794503

Siti Raihanah Abdani, Mohd Asyraf Zulkifley

{"title":"Optimal Input Selection for Single Object Tracking using RGB-Thermal Camera","authors":"Siti Raihanah Abdani, Mohd Asyraf Zulkifley","doi":"10.1109/iscaie54458.2022.9794503","DOIUrl":null,"url":null,"abstract":"In the modern era, Object tracking has been used in various intelligent applications that include surveillance, autonomous car, smart harvesting, and action recognition systems. In a video-based setting, an object tracking algorithm aims to correlate the object of interest throughout the frames by building the movement trajectory. The most popular sensing input to the tracking algorithm is the RGB channels, yet, it performs relatively poor in low lighting surroundings, especially if the object’s appearance is similar appearance to the background. Therefore, multi-modal input through a combination of RGB and thermal images has been explored to overcome the weakness of a single modality input. For a tracker that is based on the scoring output of convolutional neural networks, pre-trained weights are usually used to represent the feature extraction module. It is the norm that the weights in convolutional layers are frozen, while the parameters fitting is only done in the fully connected layers. Since the weights are pre-trained, the optimal number of channels is only three, which poses a problem for a tracker with RGB-Thermal input. Two schemes have been devised in this work, either to slice the pre-trained weights to accommodate an additional thermal channel, or to duplicate the thermal channel into a three-channel format. Hence, the performance of 4D and 6D inputs are tested on three state-of-the-art trackers, which are MDNet, TCNN, and MMCNN. The best performance result was produced by TCNN-4D with an expected average overlap of 0.2534, accuracy of 0.5963, and reliability of 0.9329. The results indicate that an optimized slicing method to select the best pre-trained weights will produce a significant tracking improvement even if fewer input channels are used. Index Terms—Single Object Tracking, RGB-Thermal Camera, Convolutional Neural Networks, Optimal Input Selection","PeriodicalId":395670,"journal":{"name":"2022 IEEE 12th Symposium on Computer Applications & Industrial Electronics (ISCAIE)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 12th Symposium on Computer Applications & Industrial Electronics (ISCAIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iscaie54458.2022.9794503","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the modern era, Object tracking has been used in various intelligent applications that include surveillance, autonomous car, smart harvesting, and action recognition systems. In a video-based setting, an object tracking algorithm aims to correlate the object of interest throughout the frames by building the movement trajectory. The most popular sensing input to the tracking algorithm is the RGB channels, yet, it performs relatively poor in low lighting surroundings, especially if the object’s appearance is similar appearance to the background. Therefore, multi-modal input through a combination of RGB and thermal images has been explored to overcome the weakness of a single modality input. For a tracker that is based on the scoring output of convolutional neural networks, pre-trained weights are usually used to represent the feature extraction module. It is the norm that the weights in convolutional layers are frozen, while the parameters fitting is only done in the fully connected layers. Since the weights are pre-trained, the optimal number of channels is only three, which poses a problem for a tracker with RGB-Thermal input. Two schemes have been devised in this work, either to slice the pre-trained weights to accommodate an additional thermal channel, or to duplicate the thermal channel into a three-channel format. Hence, the performance of 4D and 6D inputs are tested on three state-of-the-art trackers, which are MDNet, TCNN, and MMCNN. The best performance result was produced by TCNN-4D with an expected average overlap of 0.2534, accuracy of 0.5963, and reliability of 0.9329. The results indicate that an optimized slicing method to select the best pre-trained weights will produce a significant tracking improvement even if fewer input channels are used. Index Terms—Single Object Tracking, RGB-Thermal Camera, Convolutional Neural Networks, Optimal Input Selection

查看原文本刊更多论文

rgb热像仪单目标跟踪的最优输入选择

在现代，物体跟踪已被用于各种智能应用，包括监视，自动驾驶汽车，智能收获和动作识别系统。在基于视频的环境中，目标跟踪算法旨在通过构建运动轨迹来关联整个帧中感兴趣的对象。跟踪算法中最流行的传感输入是RGB通道，然而，它在低光照环境中表现相对较差，特别是当物体的外观与背景相似时。因此，通过RGB和热图像相结合的多模态输入已经被探索，以克服单一模态输入的弱点。对于基于卷积神经网络评分输出的跟踪器，通常使用预训练的权值来表示特征提取模块。卷积层中的权值是固定的，而参数拟合只在完全连接层中进行。由于权重是预先训练的，因此通道的最佳数量只有三个，这对具有RGB-Thermal输入的跟踪器提出了一个问题。在这项工作中设计了两种方案，要么将预训练的权重切片以适应额外的热通道，要么将热通道复制为三通道格式。因此，4D和6D输入的性能在三个最先进的跟踪器上进行了测试，这三个跟踪器是MDNet, TCNN和MMCNN。TCNN-4D的预期平均重叠度为0.2534，准确率为0.5963，信度为0.9329。结果表明，采用优化的切片方法来选择最佳的预训练权值，即使使用较少的输入通道，也能显著改善跟踪效果。索引术语-单目标跟踪，rgb -热相机，卷积神经网络，最优输入选择

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 12th Symposium on Computer Applications & Industrial Electronics (ISCAIE)

自引率

0.00%

发文量