Thu-Hien Le, Hoang-Nhat Tran, Phuong-Dung Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Thi-Lan Le, Thanh-Hai Tran, Hai Vu
{"title":"Locality and Relative Distance-Aware Non-local Networks for Hand-Raising Detection in Classroom Video","authors":"Thu-Hien Le, Hoang-Nhat Tran, Phuong-Dung Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Thi-Lan Le, Thanh-Hai Tran, Hai Vu","doi":"10.1109/MAPR53640.2021.9585284","DOIUrl":null,"url":null,"abstract":"Detecting and understanding interactions between students and teachers in classroom is an important criterion for computer vision-based educational assistive systems. Recently, deep long-range spatial dependencies modeling techniques, such as non-local networks, have been proven to be very effective for such tasks. Yet, regarding global context generation, we analyze that the non-local operation only compares pixels using their values, which cannot pertain to structural information. In this paper, we first extend the non-local module to corporate locality attributes. We further observe that each query is treated uniformly to generate the attention map. Hence, we incorporate distance-wise representations with an efficient implementation into the non-local formulas. The proposed locality and relative distance-aware non-local module is integrated into an object detection architecture namely Libra-RCNN and is evaluated through our experiments on a pre-access hand-raising gesture dataset. Our straightforward modification achieves 0.8% and 2.8% higher performance compared to the baseline Libra-RCNN model, in terms of mAP 0.5 and mAP 0.75 respectively.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MAPR53640.2021.9585284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Detecting and understanding interactions between students and teachers in classroom is an important criterion for computer vision-based educational assistive systems. Recently, deep long-range spatial dependencies modeling techniques, such as non-local networks, have been proven to be very effective for such tasks. Yet, regarding global context generation, we analyze that the non-local operation only compares pixels using their values, which cannot pertain to structural information. In this paper, we first extend the non-local module to corporate locality attributes. We further observe that each query is treated uniformly to generate the attention map. Hence, we incorporate distance-wise representations with an efficient implementation into the non-local formulas. The proposed locality and relative distance-aware non-local module is integrated into an object detection architecture namely Libra-RCNN and is evaluated through our experiments on a pre-access hand-raising gesture dataset. Our straightforward modification achieves 0.8% and 2.8% higher performance compared to the baseline Libra-RCNN model, in terms of mAP 0.5 and mAP 0.75 respectively.