教室视频举手检测的局部性和相对距离感知非局部性网络

2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR) Pub Date : 2021-10-01 DOI:10.1109/MAPR53640.2021.9585284

Thu-Hien Le, Hoang-Nhat Tran, Phuong-Dung Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Thi-Lan Le, Thanh-Hai Tran, Hai Vu

{"title":"教室视频举手检测的局部性和相对距离感知非局部性网络","authors":"Thu-Hien Le, Hoang-Nhat Tran, Phuong-Dung Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Thi-Lan Le, Thanh-Hai Tran, Hai Vu","doi":"10.1109/MAPR53640.2021.9585284","DOIUrl":null,"url":null,"abstract":"Detecting and understanding interactions between students and teachers in classroom is an important criterion for computer vision-based educational assistive systems. Recently, deep long-range spatial dependencies modeling techniques, such as non-local networks, have been proven to be very effective for such tasks. Yet, regarding global context generation, we analyze that the non-local operation only compares pixels using their values, which cannot pertain to structural information. In this paper, we first extend the non-local module to corporate locality attributes. We further observe that each query is treated uniformly to generate the attention map. Hence, we incorporate distance-wise representations with an efficient implementation into the non-local formulas. The proposed locality and relative distance-aware non-local module is integrated into an object detection architecture namely Libra-RCNN and is evaluated through our experiments on a pre-access hand-raising gesture dataset. Our straightforward modification achieves 0.8% and 2.8% higher performance compared to the baseline Libra-RCNN model, in terms of mAP 0.5 and mAP 0.75 respectively.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Locality and Relative Distance-Aware Non-local Networks for Hand-Raising Detection in Classroom Video\",\"authors\":\"Thu-Hien Le, Hoang-Nhat Tran, Phuong-Dung Nguyen, Hong-Quan Nguyen, Thuy-Binh Nguyen, Thi-Lan Le, Thanh-Hai Tran, Hai Vu\",\"doi\":\"10.1109/MAPR53640.2021.9585284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detecting and understanding interactions between students and teachers in classroom is an important criterion for computer vision-based educational assistive systems. Recently, deep long-range spatial dependencies modeling techniques, such as non-local networks, have been proven to be very effective for such tasks. Yet, regarding global context generation, we analyze that the non-local operation only compares pixels using their values, which cannot pertain to structural information. In this paper, we first extend the non-local module to corporate locality attributes. We further observe that each query is treated uniformly to generate the attention map. Hence, we incorporate distance-wise representations with an efficient implementation into the non-local formulas. The proposed locality and relative distance-aware non-local module is integrated into an object detection architecture namely Libra-RCNN and is evaluated through our experiments on a pre-access hand-raising gesture dataset. Our straightforward modification achieves 0.8% and 2.8% higher performance compared to the baseline Libra-RCNN model, in terms of mAP 0.5 and mAP 0.75 respectively.\",\"PeriodicalId\":233540,\"journal\":{\"name\":\"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MAPR53640.2021.9585284\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MAPR53640.2021.9585284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

检测和理解学生和教师在课堂上的互动是基于计算机视觉的教育辅助系统的一个重要标准。最近，深度远程空间依赖建模技术，如非局部网络，已被证明是非常有效的这类任务。然而，对于全局上下文生成，我们分析了非局部操作仅使用像素的值来比较像素，这与结构信息无关。在本文中，我们首先将非本地模块扩展到企业本地属性。我们进一步观察到，每个查询都被统一处理以生成注意图。因此，我们将具有有效实现的距离表示合并到非局部公式中。将提出的位置和相对距离感知非局部模块集成到目标检测体系结构中，即Libra-RCNN，并通过我们在预访问举手手势数据集上的实验进行评估。我们的简单修改在mAP 0.5和mAP 0.75方面分别比基线Libra-RCNN模型的性能提高了0.8%和2.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Locality and Relative Distance-Aware Non-local Networks for Hand-Raising Detection in Classroom Video

Detecting and understanding interactions between students and teachers in classroom is an important criterion for computer vision-based educational assistive systems. Recently, deep long-range spatial dependencies modeling techniques, such as non-local networks, have been proven to be very effective for such tasks. Yet, regarding global context generation, we analyze that the non-local operation only compares pixels using their values, which cannot pertain to structural information. In this paper, we first extend the non-local module to corporate locality attributes. We further observe that each query is treated uniformly to generate the attention map. Hence, we incorporate distance-wise representations with an efficient implementation into the non-local formulas. The proposed locality and relative distance-aware non-local module is integrated into an object detection architecture namely Libra-RCNN and is evaluated through our experiments on a pre-access hand-raising gesture dataset. Our straightforward modification achieves 0.8% and 2.8% higher performance compared to the baseline Libra-RCNN model, in terms of mAP 0.5 and mAP 0.75 respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)

自引率

0.00%

发文量