{"title":"Dynamic Multi-Loss Weighting for Multiple People Tracking in Video Surveillance Systems","authors":"Xuan-Thuy Vo, T. Tran, Duy-Linh Nguyen, K. Jo","doi":"10.1109/INDIN45523.2021.9557515","DOIUrl":null,"url":null,"abstract":"Multiple people tracking is a fundamental yet challenging task in the computer vision field, which served as a primary process for high-level tasks such as human behaviors, action recognition, pose estimation. Person tracking is decomposed into detection and re-identification (re-ID) sub-tasks. Conventionally, the detection learns classification and regression objectives simultaneously; and the re-ID sub-task is treated as a classification task. Therefore, person tracking is multiple task learning corresponding to multiple loss functions (multiple objectives) with one bounding box regression and two classifications. The difference between various tasks is as follows: the ranges of each objective are inconsistent, the contribution of each task to the overall gradient is altered, and the learning pace of each task is different (level of difficulty). It leads to an objective imbalance in multi-task learning. Previous methods proposed weighting factors as new hyper-parameters to balance the ranges of each task. The dimension of search space for manually tuning these hyper-parameters is high, which depends on the number of tasks. Accordingly, selecting reasonable weighting factors is difficult and complicated. This paper introduces dynamic multi-loss weighting (DMW) with simple but effective in which the weighting factors are dynamically changed during training without introducing any hyper-parameters. The dynamic weights are optimized to balance regression and classification objectives, which depend on the difficulty level of each task and the correlation between each task. Additionally, the general convolution operations are spatially invariant to some degree, which hinders the network’s performance. Hence, this work employs the position-sensitive operation improving feature extraction. The proposed method is conducted on the MOT17 challenging benchmark, which outperforms the online multiple people trackers without using additional data.","PeriodicalId":370921,"journal":{"name":"2021 IEEE 19th International Conference on Industrial Informatics (INDIN)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 19th International Conference on Industrial Informatics (INDIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN45523.2021.9557515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Multiple people tracking is a fundamental yet challenging task in the computer vision field, which served as a primary process for high-level tasks such as human behaviors, action recognition, pose estimation. Person tracking is decomposed into detection and re-identification (re-ID) sub-tasks. Conventionally, the detection learns classification and regression objectives simultaneously; and the re-ID sub-task is treated as a classification task. Therefore, person tracking is multiple task learning corresponding to multiple loss functions (multiple objectives) with one bounding box regression and two classifications. The difference between various tasks is as follows: the ranges of each objective are inconsistent, the contribution of each task to the overall gradient is altered, and the learning pace of each task is different (level of difficulty). It leads to an objective imbalance in multi-task learning. Previous methods proposed weighting factors as new hyper-parameters to balance the ranges of each task. The dimension of search space for manually tuning these hyper-parameters is high, which depends on the number of tasks. Accordingly, selecting reasonable weighting factors is difficult and complicated. This paper introduces dynamic multi-loss weighting (DMW) with simple but effective in which the weighting factors are dynamically changed during training without introducing any hyper-parameters. The dynamic weights are optimized to balance regression and classification objectives, which depend on the difficulty level of each task and the correlation between each task. Additionally, the general convolution operations are spatially invariant to some degree, which hinders the network’s performance. Hence, this work employs the position-sensitive operation improving feature extraction. The proposed method is conducted on the MOT17 challenging benchmark, which outperforms the online multiple people trackers without using additional data.