Dynamic Multi-Loss Weighting for Multiple People Tracking in Video Surveillance Systems

2021 IEEE 19th International Conference on Industrial Informatics (INDIN) Pub Date : 2021-07-21 DOI:10.1109/INDIN45523.2021.9557515

Xuan-Thuy Vo, T. Tran, Duy-Linh Nguyen, K. Jo

{"title":"Dynamic Multi-Loss Weighting for Multiple People Tracking in Video Surveillance Systems","authors":"Xuan-Thuy Vo, T. Tran, Duy-Linh Nguyen, K. Jo","doi":"10.1109/INDIN45523.2021.9557515","DOIUrl":null,"url":null,"abstract":"Multiple people tracking is a fundamental yet challenging task in the computer vision field, which served as a primary process for high-level tasks such as human behaviors, action recognition, pose estimation. Person tracking is decomposed into detection and re-identification (re-ID) sub-tasks. Conventionally, the detection learns classification and regression objectives simultaneously; and the re-ID sub-task is treated as a classification task. Therefore, person tracking is multiple task learning corresponding to multiple loss functions (multiple objectives) with one bounding box regression and two classifications. The difference between various tasks is as follows: the ranges of each objective are inconsistent, the contribution of each task to the overall gradient is altered, and the learning pace of each task is different (level of difficulty). It leads to an objective imbalance in multi-task learning. Previous methods proposed weighting factors as new hyper-parameters to balance the ranges of each task. The dimension of search space for manually tuning these hyper-parameters is high, which depends on the number of tasks. Accordingly, selecting reasonable weighting factors is difficult and complicated. This paper introduces dynamic multi-loss weighting (DMW) with simple but effective in which the weighting factors are dynamically changed during training without introducing any hyper-parameters. The dynamic weights are optimized to balance regression and classification objectives, which depend on the difficulty level of each task and the correlation between each task. Additionally, the general convolution operations are spatially invariant to some degree, which hinders the network’s performance. Hence, this work employs the position-sensitive operation improving feature extraction. The proposed method is conducted on the MOT17 challenging benchmark, which outperforms the online multiple people trackers without using additional data.","PeriodicalId":370921,"journal":{"name":"2021 IEEE 19th International Conference on Industrial Informatics (INDIN)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 19th International Conference on Industrial Informatics (INDIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN45523.2021.9557515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Multiple people tracking is a fundamental yet challenging task in the computer vision field, which served as a primary process for high-level tasks such as human behaviors, action recognition, pose estimation. Person tracking is decomposed into detection and re-identification (re-ID) sub-tasks. Conventionally, the detection learns classification and regression objectives simultaneously; and the re-ID sub-task is treated as a classification task. Therefore, person tracking is multiple task learning corresponding to multiple loss functions (multiple objectives) with one bounding box regression and two classifications. The difference between various tasks is as follows: the ranges of each objective are inconsistent, the contribution of each task to the overall gradient is altered, and the learning pace of each task is different (level of difficulty). It leads to an objective imbalance in multi-task learning. Previous methods proposed weighting factors as new hyper-parameters to balance the ranges of each task. The dimension of search space for manually tuning these hyper-parameters is high, which depends on the number of tasks. Accordingly, selecting reasonable weighting factors is difficult and complicated. This paper introduces dynamic multi-loss weighting (DMW) with simple but effective in which the weighting factors are dynamically changed during training without introducing any hyper-parameters. The dynamic weights are optimized to balance regression and classification objectives, which depend on the difficulty level of each task and the correlation between each task. Additionally, the general convolution operations are spatially invariant to some degree, which hinders the network’s performance. Hence, this work employs the position-sensitive operation improving feature extraction. The proposed method is conducted on the MOT17 challenging benchmark, which outperforms the online multiple people trackers without using additional data.

查看原文本刊更多论文

视频监控系统中多人跟踪的动态多损失加权

多人跟踪是计算机视觉领域的一项基础而又具有挑战性的任务，它是人类行为、动作识别、姿态估计等高级任务的主要过程。人员跟踪被分解为检测和重新识别子任务。通常，检测同时学习分类和回归目标;将重标识子任务作为分类任务处理。因此，人员跟踪是一个边界盒回归和两个分类的多个损失函数(多目标)对应的多任务学习。不同任务之间的区别在于:每个目标的范围不一致，每个任务对整体梯度的贡献改变，每个任务的学习速度不同(难度等级)。它导致了多任务学习中的客观失衡。以前的方法提出了加权因子作为新的超参数来平衡每个任务的范围。手动调优这些超参数的搜索空间维度很高，这取决于任务的数量。因此，选择合理的权重因子是困难和复杂的。本文介绍了一种简单有效的动态多损失加权方法，该方法在训练过程中动态改变加权因子，不引入任何超参数。根据每个任务的难易程度和任务之间的相关性，优化动态权重以平衡回归目标和分类目标。此外，一般的卷积运算在一定程度上是空间不变的，这影响了网络的性能。因此，本文采用位置敏感操作改进特征提取。该方法在MOT17挑战性基准测试上进行，在不使用额外数据的情况下优于在线多人跟踪器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 19th International Conference on Industrial Informatics (INDIN)

自引率

0.00%

发文量