Kaidong Li, Ningning Wang, Yiju Yang, Guanghui Wang
{"title":"SGNet: A Super-class Guided Network for Image Classification and Object Detection","authors":"Kaidong Li, Ningning Wang, Yiju Yang, Guanghui Wang","doi":"10.1109/CRV52889.2021.00025","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00025","url":null,"abstract":"Most classification models treat different object classes in parallel and the misclassifications between any two classes are treated equally. In contrast, human beings can exploit high-level information in making a prediction of an unknown object. Inspired by this observation, the paper proposes a super-class guided network (SGNet) to integrate the high-level semantic information into the network so as to increase its performance in inference. SGNet takes two-level class annotations that contain both super-class and finer class labels. The super-classes are higher-level semantic categories that consist of a certain amount of finer classes. A super-class branch (SCB), trained on super-class labels, is introduced to guide finer class prediction. At the inference time, we adopt two different strategies: Two-step inference (TSI) and direct inference (DI). TSI first predicts the super-class and then makes predictions of the corresponding finer class. On the other hand, DI directly generates predictions from the finer class branch (FCB). Extensive experiments have been performed on CIFAR-100 and MS COCO datasets. The experimental results validate the proposed approach and demonstrate its superior performance on image classification and object detection.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130580736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving state-of-the-art in Detecting Student Engagement with Resnet and TCN Hybrid Network","authors":"A. Abedi, Shehroz S. Khan","doi":"10.1109/CRV52889.2021.00028","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00028","url":null,"abstract":"Automatic detection of students' engagement in online learning settings is a key element to improve the quality of learning and to deliver personalized learning materials to them. Varying levels of engagement exhibited by students in an online classroom is an affective behavior that takes place over space and time. Therefore, we formulate detecting levels of students' engagement from videos as a spatio-temporal classification problem. In this paper, we present a novel end-to-end Residual Network (ResNet) and Temporal Convolutional Network (TCN) hybrid neural network architecture for students' engagement level detection in videos. The 2D ResNet extracts spatial features from consecutive video frames, and the TCN analyzes the temporal changes in video frames to detect the level of engagement. The spatial and temporal arms of the hybrid network are jointly trained on raw video frames of a large publicly available students' engagement detection dataset, DAiSEE. We compared our method with several competing students' engagement detection methods on this dataset. The ResNet+TCN architecture outperforms all other studied methods, improves the state-of-the-art engagement level detection accuracy, and sets a new baseline for future research.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114536152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Point Transformation Methods For Self-Supervised Depth Prediction","authors":"Ziwen Chen, Zixuan Guo, Jerod J. Weinman","doi":"10.1109/CRV52889.2021.00023","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00023","url":null,"abstract":"Given stereo or egomotion image pairs, a popular and successful method for unsupervised learning of monocular depth estimation is to measure the quality of image reconstructions resulting from the learned depth predictions. Continued research has improved the overall approach in recent years, yet the common framework still suffers from several important limitations, particularly when dealing with points occluded after transformation to a novel viewpoint. While prior work has addressed the problem heuristically, this paper introduces a z-buffering algorithm that correctly and efficiently handles occluded points. Because our algorithm is implemented with operators typical of machine learning libraries, it can be incorporated into any existing unsupervised depth learning framework with automatic support for differentiation. Additionally, because points having negative depth after transformation often signify erroneously shallow depth predictions, we introduce a loss function to explicitly penalize this undesirable behavior. Experimental results on the KITTI data set show that the z-buffer and negative depth loss both improve the performance of a state of the art depth-prediction network. The code is available at https://github.com/arthurhero/ZbuffDepth and archived at https://hdl.handle.net/11084/10450.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131586575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relatively Lazy: Indoor-Outdoor Navigation Using Vision and GNSS","authors":"Benjamin Congram, T. Barfoot","doi":"10.1109/CRV52889.2021.00015","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00015","url":null,"abstract":"Visual Teach and Repeat has shown relative navigation is a robust and efficient solution for autonomous vision-based path following in difficult environments. Adding additional absolute sensors such as Global Navigation Satellite Systems (GNSS) has the potential to expand the domain of Visual Teach and Repeat to environments where the ability to visually localize is not guaranteed. Our method of lazy mapping and delaying estimation until a path-tracking error is needed avoids the need to estimate absolute states. As a result, map optimization is not required and paths can be driven immediately after being taught. We validate our approach on a real robot through an experiment in a joint indoor-outdoor environment comprising 3.5km of autonomous route repeating across a variety of lighting conditions. We achieve smooth error signals throughout the runs despite large sections of dropout for each sensor.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126666237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}