Thu-Hien Le, Hoang-Nhat Tran, Phuong-Dung Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Thi-Lan Le, Thanh-Hai Tran, Hai Vu
{"title":"教室视频举手检测的局部性和相对距离感知非局部性网络","authors":"Thu-Hien Le, Hoang-Nhat Tran, Phuong-Dung Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Thi-Lan Le, Thanh-Hai Tran, Hai Vu","doi":"10.1109/MAPR53640.2021.9585284","DOIUrl":null,"url":null,"abstract":"Detecting and understanding interactions between students and teachers in classroom is an important criterion for computer vision-based educational assistive systems. Recently, deep long-range spatial dependencies modeling techniques, such as non-local networks, have been proven to be very effective for such tasks. Yet, regarding global context generation, we analyze that the non-local operation only compares pixels using their values, which cannot pertain to structural information. In this paper, we first extend the non-local module to corporate locality attributes. We further observe that each query is treated uniformly to generate the attention map. Hence, we incorporate distance-wise representations with an efficient implementation into the non-local formulas. The proposed locality and relative distance-aware non-local module is integrated into an object detection architecture namely Libra-RCNN and is evaluated through our experiments on a pre-access hand-raising gesture dataset. Our straightforward modification achieves 0.8% and 2.8% higher performance compared to the baseline Libra-RCNN model, in terms of mAP 0.5 and mAP 0.75 respectively.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Locality and Relative Distance-Aware Non-local Networks for Hand-Raising Detection in Classroom Video\",\"authors\":\"Thu-Hien Le, Hoang-Nhat Tran, Phuong-Dung Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Thi-Lan Le, Thanh-Hai Tran, Hai Vu\",\"doi\":\"10.1109/MAPR53640.2021.9585284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detecting and understanding interactions between students and teachers in classroom is an important criterion for computer vision-based educational assistive systems. Recently, deep long-range spatial dependencies modeling techniques, such as non-local networks, have been proven to be very effective for such tasks. Yet, regarding global context generation, we analyze that the non-local operation only compares pixels using their values, which cannot pertain to structural information. In this paper, we first extend the non-local module to corporate locality attributes. We further observe that each query is treated uniformly to generate the attention map. Hence, we incorporate distance-wise representations with an efficient implementation into the non-local formulas. The proposed locality and relative distance-aware non-local module is integrated into an object detection architecture namely Libra-RCNN and is evaluated through our experiments on a pre-access hand-raising gesture dataset. Our straightforward modification achieves 0.8% and 2.8% higher performance compared to the baseline Libra-RCNN model, in terms of mAP 0.5 and mAP 0.75 respectively.\",\"PeriodicalId\":233540,\"journal\":{\"name\":\"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MAPR53640.2021.9585284\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MAPR53640.2021.9585284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Locality and Relative Distance-Aware Non-local Networks for Hand-Raising Detection in Classroom Video
Detecting and understanding interactions between students and teachers in classroom is an important criterion for computer vision-based educational assistive systems. Recently, deep long-range spatial dependencies modeling techniques, such as non-local networks, have been proven to be very effective for such tasks. Yet, regarding global context generation, we analyze that the non-local operation only compares pixels using their values, which cannot pertain to structural information. In this paper, we first extend the non-local module to corporate locality attributes. We further observe that each query is treated uniformly to generate the attention map. Hence, we incorporate distance-wise representations with an efficient implementation into the non-local formulas. The proposed locality and relative distance-aware non-local module is integrated into an object detection architecture namely Libra-RCNN and is evaluated through our experiments on a pre-access hand-raising gesture dataset. Our straightforward modification achieves 0.8% and 2.8% higher performance compared to the baseline Libra-RCNN model, in terms of mAP 0.5 and mAP 0.75 respectively.