K. Delgado, Juan Manuel Origgi, Tania Hasanpoor, Hao Yu, Danielle A. Allessio, I. Arroyo, William Lee, Margrit Betke, B. Woolf, Sarah Adel Bargal
{"title":"Student Engagement Dataset","authors":"K. Delgado, Juan Manuel Origgi, Tania Hasanpoor, Hao Yu, Danielle A. Allessio, I. Arroyo, William Lee, Margrit Betke, B. Woolf, Sarah Adel Bargal","doi":"10.1109/ICCVW54120.2021.00405","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00405","url":null,"abstract":"A major challenge for online learning is the inability of systems to support student emotion and to maintain student engagement. In response to this challenge, computer vision has become an embedded feature in some instructional applications. In this paper, we propose a video dataset of college students solving math problems on the educational platform MathSpring.org with a front facing camera collecting visual feedback of student gestures. The video dataset is annotated to indicate whether students’ attention at specific frames is engaged or wandering. In addition, we train baselines for a computer vision module that determines the extent of student engagement during remote learning. Baselines include state-of-the-art deep learning image classifiers and traditional conditional and logistic regression for head pose estimation. We then incorporate a gaze baseline into the MathSpring learning platform, and we are evaluating its performance with the currently implemented approach.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128519054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Neven, D. Neven, Bert De Brabandere, M. Proesmans, Toon Goedem'e
{"title":"Weakly-Supervised Semantic Segmentation by Learning Label Uncertainty","authors":"R. Neven, D. Neven, Bert De Brabandere, M. Proesmans, Toon Goedem'e","doi":"10.1109/ICCVW54120.2021.00193","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00193","url":null,"abstract":"Since the rise of deep learning, many computer vision tasks have seen significant advancements. However, the downside of deep learning is that it is very data-hungry. Especially for segmentation problems, training a deep neural net requires dense supervision in the form of pixel-perfect image labels, which are very costly. In this paper, we present a new loss function to train a segmentation network with only a small subset of pixel-perfect labels, but take the advantage of weakly-annotated training samples in the form of cheap bounding-box labels. Unlike recent works which make use of box-to-mask proposal generators, our loss trains the network to learn a label uncertainty within the bounding-box, which can be leveraged to perform online bootstrapping (i.e. transforming the boxes to segmentation masks), while training the network. We evaluated our method on binary segmentation tasks, as well as a multi-class segmentation task (CityScapes vehicles and persons). We trained each task on a dataset comprised of only 18% pixel-perfect and 82% bounding-box labels, and compared the results to a baseline model trained on a completely pixel-perfect dataset. For the binary segmentation tasks, our method achieves an IoU score which is 98.33% as good as our baseline model, while for the multi-class task, our method is 97.12% as good as our baseline model (77.5 vs. 79.8 mIoU).","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124583102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phan Tran Dac Thinh, Hoang Manh Hung, Hyung-Jeong Yang, Soohyung Kim, Gueesang Lee
{"title":"Emotion Recognition With Sequential Multi-task Learning Technique","authors":"Phan Tran Dac Thinh, Hoang Manh Hung, Hyung-Jeong Yang, Soohyung Kim, Gueesang Lee","doi":"10.1109/ICCVW54120.2021.00400","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00400","url":null,"abstract":"The task of predicting affective information in the wild such as seven basic emotions or action units from human faces has gradually become more interesting due to the accessibility and availability of massive annotated datasets. In this study, we propose a method that utilizes the association between seven basic emotions and twelve action units from the AffWild2 dataset. The method based on the architecture of ResNet50 involves the multi-task learning technique for the incomplete labels of the two tasks. By combining the knowledge for two correlated tasks, both performances are improved by a large margin compared to those with the model employing only one kind of label.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114753278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A transformer-based framework for automatic COVID19 diagnosis in chest CTs","authors":"Lei Zhang, Yan-mao Wen","doi":"10.1109/ICCVW54120.2021.00063","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00063","url":null,"abstract":"Automated diagnosis of covid19 in chest CTs is becoming a clinically important technique to support precision and efficient diagnosis and treatment planning. A few efforts have been made to automatically diagnose the COVID-19 in CTs using CNNs, and the task still remains a challenge. In this paper, we present a transformer-based framework for COVID19 classification. We attempt to expand the adaption of vision transformer as a robust feature learner to the 3D CTs to diagnose the COVID-19. The framework consists of two main stages: lung segmentation using UNet followed by the classification, in which the features extracted from each CT slice using Swin transformer in a CT scan are aggregated into 3D volume level feature. We also investigated the performance of using the robust CNNs (BiT and EfficientNetV2) as backbones in the framework. The dataset from the ICCV workshop: MIA-COV19D, is used in our experiments. The evaluation results show that the method with the backbone of Swin transformer gain the best F1 score of 0.935 on the validation dataset, while the CNN based backbone of EfficientNetV2 has the competitive classification performance with the best precision of 93.7%. The final prediction model with Swin transformer achieves the F1 score of 0.84 on the test dataset, which doesn’t require an additional post-processing stage.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114881055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ToFNest: Efficient normal estimation for time-of-flight depth cameras","authors":"Szilárd Molnár, Benjamin Kelényi, L. Tamás","doi":"10.1109/ICCVW54120.2021.00205","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00205","url":null,"abstract":"In this work, we propose an efficient normal estimation method for depth images acquired by Time-of-Flight (ToF) cameras based on feature pyramid networks (FPN). We perform the normal estimation starting from the 2D depth images, projecting the measured data into the 3D space and computing the loss function for the point cloud normal. Despite its simplicity, our method called ToFNest proves to be efficient in terms of robustness and runtime. In order to validate ToFNest we performed extensive evaluations using both public and custom outdoor datasets. Compared with the state of the art methods, our algorithm is faster by an order of magnitude without losing precision on public datasets. The demo code is available on https://github.com/molnarszilard/ToFNest","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124049616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Point Cloud Object Segmentation Using Multi Elevation-Layer 2D Bounding-Boxes","authors":"Tristan Brodeur, H. Aliakbarpour, S. Suddarth","doi":"10.1109/ICCVW54120.2021.00438","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00438","url":null,"abstract":"Segmentation of point clouds is a necessary pre-processing technique when object discrimination is needed for scene understanding. In this paper, we propose a segmentation technique utilizing 2D bounding-box data obtained via the orthographic projection of 3D points onto a plane at multiple elevation layers. Connected components is utilized to obtain bounding-box data, and a consistency metric between bounding-boxes at various elevation layers helps determine the classification of the bounding-box to an object of the scene. The merging of point data within each 2D bounding-box results in an object-segmented point cloud. Our method conducts segmentation using only the topological information of the point data within a dataset, requiring no extra computation of normals, creation of an octree or k-d tree, nor a dependency on RGB or intensity data associated with a point. Initial experiments are run on a set of point cloud datasets obtained via photogrammetric means, as well as some open-source, LIDAR-generated point clouds, showing the method to be capture agnostic. Results demonstrate the efficacy of this method in obtaining a distinct set of objects contained within a point cloud.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124123718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Quaternion Pose Proposals for 6D Object Pose Tracking","authors":"Mateusz Majcher, B. Kwolek","doi":"10.1109/ICCVW54120.2021.00032","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00032","url":null,"abstract":"In this work we study quaternion pose distributions for tracking in RGB image sequences the 6D pose of an object selected from a set of objects, for which common models were trained in advance. We propose an unit quaternion representation of the rotational state space for a particle filter, which is then integrated with the particle swarm optimization to shift samples toward local maximas. Owing to k-means++ we better maintain multimodal probability distributions. We train convolutional neural networks to estimate the 2D positions of fiducial points and then to determine PnP-based object pose hypothesis. A CNN is utilized to estimate the positions of fiducial points in order to calculate PnP-based object pose hypothesis. A common Siamese neural network for all objects, which is trained on keypoints from current and previous frame is employed to guide the particles towards predicted pose of the object. Such a key-point based pose hypothesis is injected into the probability distribution that is recursively updated in a Bayesian framework. The 6D object pose tracker is evaluated on Nvidia Jetson AGX Xavier both on synthetic and real sequences of images acquired from a calibrated RGB camera.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126328698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Wang, Hasan Al-Banna Mohamed, Zuowen Wang, Bodo Rueckauer, Shih-Chii Liu
{"title":"LiteEdge: Lightweight Semantic Edge Detection Network","authors":"Hao Wang, Hasan Al-Banna Mohamed, Zuowen Wang, Bodo Rueckauer, Shih-Chii Liu","doi":"10.1109/ICCVW54120.2021.00300","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00300","url":null,"abstract":"Scene parsing is a critical component for understanding complex scenes in applications such as autonomous driving. Semantic segmentation networks are typically reported for scene parsing but semantic edge networks have also become of interest because of the sparseness of the segmented maps. This work presents an end-to-end trained lightweight deep semantic edge detection architecture called LiteEdge suitable for edge deployment. By utilizing hierarchical supervision and a new weighted multi-label loss function to balance different edge classes during training, LiteEdge predicts with high accuracy category-wise binary edges. Our LiteEdge network with only ≈ 3M parameters, has a semantic edge prediction accuracy of 52.9% mean maximum F (MF) score on the Cityscapes dataset. This accuracy was evaluated on the network trained to produce a low resolution edge map. The network can be quantized to 6-bit weights and 8-bit activations and shows only a 2% drop in the mean MF score. This quantization leads to a memory footprint savings of 6X for an edge device.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126529572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keyhan Najafian, Ali Ghanbari, I. Stavness, Lingling Jin, G. Shirdel, Farhad Maleki
{"title":"A Semi-self-supervised Learning Approach for Wheat Head Detection using Extremely Small Number of Labeled Samples","authors":"Keyhan Najafian, Ali Ghanbari, I. Stavness, Lingling Jin, G. Shirdel, Farhad Maleki","doi":"10.1109/ICCVW54120.2021.00155","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00155","url":null,"abstract":"Most of the success of deep learning is owed to supervised learning, where a large-scale annotated dataset is used for model training. However, developing such datasets is challenging. In this paper, we develop a semi-self-supervised learning approach for wheat head detection. The proposed method utilized a few short video clips and only one annotated image from each video clip of wheat fields to simulate a large computationally annotated dataset used for model building. Considering the domain gap be-tween the simulated and real images, we applied two do-main adaptation steps to alleviate the challenge of distributional shift. The resulting model achieved high performance when applied to real unannotated datasets. When fine-tuned on the dataset from the Global Wheat Head Detection Challenge, the performance was further improved. The model achieved a mean average precision of 0.827, where an over-lap of 50% or more between a predicted bounding box and ground truth was considered as a correct prediction. Al-though the utility of the proposed methodology was shown by applying it to wheat head detection, the proposed method is not limited to this application and could be used for other domains, such as detecting different crop types, alleviating the barrier of lack of large-scale annotated datasets in those domains.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125747674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Key Human Features for Pose Transfer","authors":"Victor-Andrei Ivan, Ionut Mistreanu, Andrei Leica, Sung-Jun Yoon, Manri Cheon, Junwoo Lee, Jinsoo Oh","doi":"10.1109/ICCVW54120.2021.00223","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00223","url":null,"abstract":"It is still a great challenge in the Pose Transfer task to generate visually coherent images, to preserve the texture of clothes, to maintain the source identity and to realistically generate key human features such as the face or the hands. To tackle these challenges, we first conduct a study to obtain the most robust conditioning labels for this task and the baseline method [44] that we choose. We then improve upon the baseline by including deep source features from an Auto-encoder through an Attention mechanism. Finally we add region discriminators that are focused on key human features, thus obtaining results competitive with the state-of-the-art.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127913666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}