Optimal Input Selection for Single Object Tracking using RGB-Thermal Camera

Siti Raihanah Abdani, Mohd Asyraf Zulkifley
{"title":"Optimal Input Selection for Single Object Tracking using RGB-Thermal Camera","authors":"Siti Raihanah Abdani, Mohd Asyraf Zulkifley","doi":"10.1109/iscaie54458.2022.9794503","DOIUrl":null,"url":null,"abstract":"In the modern era, Object tracking has been used in various intelligent applications that include surveillance, autonomous car, smart harvesting, and action recognition systems. In a video-based setting, an object tracking algorithm aims to correlate the object of interest throughout the frames by building the movement trajectory. The most popular sensing input to the tracking algorithm is the RGB channels, yet, it performs relatively poor in low lighting surroundings, especially if the object’s appearance is similar appearance to the background. Therefore, multi-modal input through a combination of RGB and thermal images has been explored to overcome the weakness of a single modality input. For a tracker that is based on the scoring output of convolutional neural networks, pre-trained weights are usually used to represent the feature extraction module. It is the norm that the weights in convolutional layers are frozen, while the parameters fitting is only done in the fully connected layers. Since the weights are pre-trained, the optimal number of channels is only three, which poses a problem for a tracker with RGB-Thermal input. Two schemes have been devised in this work, either to slice the pre-trained weights to accommodate an additional thermal channel, or to duplicate the thermal channel into a three-channel format. Hence, the performance of 4D and 6D inputs are tested on three state-of-the-art trackers, which are MDNet, TCNN, and MMCNN. The best performance result was produced by TCNN-4D with an expected average overlap of 0.2534, accuracy of 0.5963, and reliability of 0.9329. The results indicate that an optimized slicing method to select the best pre-trained weights will produce a significant tracking improvement even if fewer input channels are used. Index Terms—Single Object Tracking, RGB-Thermal Camera, Convolutional Neural Networks, Optimal Input Selection","PeriodicalId":395670,"journal":{"name":"2022 IEEE 12th Symposium on Computer Applications & Industrial Electronics (ISCAIE)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 12th Symposium on Computer Applications & Industrial Electronics (ISCAIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iscaie54458.2022.9794503","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the modern era, Object tracking has been used in various intelligent applications that include surveillance, autonomous car, smart harvesting, and action recognition systems. In a video-based setting, an object tracking algorithm aims to correlate the object of interest throughout the frames by building the movement trajectory. The most popular sensing input to the tracking algorithm is the RGB channels, yet, it performs relatively poor in low lighting surroundings, especially if the object’s appearance is similar appearance to the background. Therefore, multi-modal input through a combination of RGB and thermal images has been explored to overcome the weakness of a single modality input. For a tracker that is based on the scoring output of convolutional neural networks, pre-trained weights are usually used to represent the feature extraction module. It is the norm that the weights in convolutional layers are frozen, while the parameters fitting is only done in the fully connected layers. Since the weights are pre-trained, the optimal number of channels is only three, which poses a problem for a tracker with RGB-Thermal input. Two schemes have been devised in this work, either to slice the pre-trained weights to accommodate an additional thermal channel, or to duplicate the thermal channel into a three-channel format. Hence, the performance of 4D and 6D inputs are tested on three state-of-the-art trackers, which are MDNet, TCNN, and MMCNN. The best performance result was produced by TCNN-4D with an expected average overlap of 0.2534, accuracy of 0.5963, and reliability of 0.9329. The results indicate that an optimized slicing method to select the best pre-trained weights will produce a significant tracking improvement even if fewer input channels are used. Index Terms—Single Object Tracking, RGB-Thermal Camera, Convolutional Neural Networks, Optimal Input Selection
rgb热像仪单目标跟踪的最优输入选择
在现代,物体跟踪已被用于各种智能应用,包括监视,自动驾驶汽车,智能收获和动作识别系统。在基于视频的环境中,目标跟踪算法旨在通过构建运动轨迹来关联整个帧中感兴趣的对象。跟踪算法中最流行的传感输入是RGB通道,然而,它在低光照环境中表现相对较差,特别是当物体的外观与背景相似时。因此,通过RGB和热图像相结合的多模态输入已经被探索,以克服单一模态输入的弱点。对于基于卷积神经网络评分输出的跟踪器,通常使用预训练的权值来表示特征提取模块。卷积层中的权值是固定的,而参数拟合只在完全连接层中进行。由于权重是预先训练的,因此通道的最佳数量只有三个,这对具有RGB-Thermal输入的跟踪器提出了一个问题。在这项工作中设计了两种方案,要么将预训练的权重切片以适应额外的热通道,要么将热通道复制为三通道格式。因此,4D和6D输入的性能在三个最先进的跟踪器上进行了测试,这三个跟踪器是MDNet, TCNN和MMCNN。TCNN-4D的预期平均重叠度为0.2534,准确率为0.5963,信度为0.9329。结果表明,采用优化的切片方法来选择最佳的预训练权值,即使使用较少的输入通道,也能显著改善跟踪效果。索引术语-单目标跟踪,rgb -热相机,卷积神经网络,最优输入选择
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信