Ke Lin, Irene Cho, Ameya S. Walimbe, Bryan A. Zamora, Alex Rich, Sirius Z. Zhang, Tobias Höllerer
{"title":"Benefits of Synthetically Pre-trained Depth-Prediction Networks for Indoor/Outdoor Image Classification","authors":"Ke Lin, Irene Cho, Ameya S. Walimbe, Bryan A. Zamora, Alex Rich, Sirius Z. Zhang, Tobias Höllerer","doi":"10.1109/WACVW58289.2023.00040","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00040","url":null,"abstract":"Ground truth depth information is necessary for many computer vision tasks. Collecting this information is chal-lenging, especially for outdoor scenes. In this work, we propose utilizing single-view depth prediction neural networks pre-trained on synthetic scenes to generate relative depth, which we call pseudo-depth. This approach is a less expen-sive option as the pre-trained neural network obtains ac-curate depth information from synthetic scenes, which does not require any expensive sensor equipment and takes less time. We measure the usefulness of pseudo-depth from pre-trained neural networks by training indoor/outdoor binary classifiers with and without it. We also compare the difference in accuracy between using pseudo-depth and ground truth depth. We experimentally show that adding pseudo-depth to training achieves a 4.4% performance boost over the non-depth baseline model on DIODE, a large stan-dard test dataset, retaining 63.8% of the performance boost achieved from training a classifier on RGB and ground truth depth. It also boosts performance by 1.3% on another dataset, SUN397, for which ground truth depth is not avail-able. Our result shows that it is possible to take information obtained from a model pre-trained on synthetic scenes and successfully apply it beyond the synthetic domain to real-world data.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116758902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Transformer-based Late-Fusion Mechanism for Fine-Grained Object Recognition in Videos","authors":"Jannik Koch, Stefan Wolf, Jürgen Beyerer","doi":"10.1109/WACVW58289.2023.00015","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00015","url":null,"abstract":"Fine-grained image classification is limited by only considering a single view while in many cases, like surveillance, a whole video exists which provides multiple perspectives. However, the potential of videos is mostly considered in the context of action recognition while fine-grained object recognition is rarely considered as an application for video classification. This leads to recent video classification architectures being inappropriate for the task of fine-grained object recognition. We propose a novel, Transformer-based late-fusion mechanism for fine-grained video classification. Our approach achieves superior results to both early-fusion mechanisms, like the Video Swin Transformer, and a simple consensus-based late-fusion baseline with a modern Swin Transformer backbone. Additionally, we achieve improved efficiency, as our results show a high increase in accuracy with only a slight increase in computational complexity. Code is available at: https://github.com/wolfstefan/tlf.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121309808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zaixi Shang, Yixu Chen, Yongjun Wu, Hai Wei, S. Sethuraman
{"title":"Subjective and Objective Video Quality Assessment of High Dynamic Range Sports Content","authors":"Zaixi Shang, Yixu Chen, Yongjun Wu, Hai Wei, S. Sethuraman","doi":"10.1109/WACVW58289.2023.00062","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00062","url":null,"abstract":"High Dynamic Range (HDR) video streaming has be-come more popular because of the faithful color and bright-ness presentation. However, the live streaming of HDR, especially of sports content, has unique challenges, as it was usually encoded and distributed in real-time without the post-production workflow. A set of unique problems that occurs only in live streaming, e.g. resolution and frame rate crossover, intra-frame pulsing video quality defects, complex relationship between rate-control mode and video quality, are more salient when the videos are streamed in HDR format. These issues are typically ignored by other subjective databases, disregard the fact that they have a sig-nificant impact on the perceived quality of the videos. In this paper, we present a large-scale HDR video quality dataset for sports content that includes the above mentioned important issues in live streaming, and a method of merging multi-ple datasets using anchor videos. We also benchmarked ex-isting video quality metrics on the new dataset, particularly over the novel scopes included in the database, to evaluate the effectiveness and efficiency of the existing models. We found that despite the strong overall performance over the entire database, most of the tested models perform poorly when predicting human preference for various encoding pa-rameters, such as frame rate and adaptive quantization.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114523064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sonar Image Composition for Semantic Segmentation Using Machine Learning","authors":"William Ard, Corina Barbalata","doi":"10.1109/WACVW58289.2023.00031","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00031","url":null,"abstract":"This paper presents an approach for merging side scan sonar data and bathymetry information for the benefit of improved automatic shipwreck identification. The steps to combine a raw side-scan sonar image with a 2D relief map into a new composite RGB image are presented in detail, and a supervised image segmentation approach via the U-Net architecture is implemented to identify shipwrecks. To validate the effectiveness of the approach, two datasets were created from shipwreck surveys: one using side-scan only, and one using the new composite RGB images. The U-Net model was trained and tested on each dataset, and the results were compared. The test results show a mean accuracy which is 15% higher for the case where the RGB composition is used when compared with the model trained and tested with the side-scan sonar only dataset. Furthermore, the mean intersection over union (IoU) shows an increase of 9.5% using the RGB composition model.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128945986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christof W. Corsel, Michel van Lier, L. Kampmeijer, N. Boehrer, E. Bakker
{"title":"Exploiting Temporal Context for Tiny Object Detection","authors":"Christof W. Corsel, Michel van Lier, L. Kampmeijer, N. Boehrer, E. Bakker","doi":"10.1109/WACVW58289.2023.00013","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00013","url":null,"abstract":"In surveillance applications, the detection of tiny, low-resolution objects remains a challenging task. Most deep learning object detection methods rely on appearance features extracted from still images and struggle to accurately detect tiny objects. In this paper, we address the problem of tiny object detection for real-time surveillance applications, by exploiting the temporal context available in video sequences recorded from static cameras. We present a spatiotemporal deep learning model based on YOLOv5 that exploits temporal context by processing sequences of frames at once. The model drastically improves the identification of tiny moving objects in the aerial surveillance and person detection domains, without degrading the detection of stationary objects. Additionally, a two-stream architecture that uses frame-difference as explicit motion information was proposed, further improving the detection of moving objects down to $4times 4$ pixels in size. Our approaches outperform previous work on the public WPAFB WAMI dataset, as well as surpassing previous work on an embedded NVIDIA Jetson Nano deployment in both accuracy and inference speed. We conclude that the addition of temporal context to deep learning object detectors is an effective approach to drastically improve the detection of tiny moving objects in static videos.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133556628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hélio Perroni Filho, Aleksander Trajcevski, K. Bhargava, Nizwa Javed, J. Elder
{"title":"Attentive Sensing for Long-Range Face Recognition","authors":"Hélio Perroni Filho, Aleksander Trajcevski, K. Bhargava, Nizwa Javed, J. Elder","doi":"10.1109/WACVW58289.2023.00068","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00068","url":null,"abstract":"To be effective, a social robot must reliably detect and recognize people in all visual directions and in both near and far fields. A major challenge is the resolution/field-of-view tradeoff; here we propose and evaluate a novel attentive sensing solution. Panoramic low-resolution pre-attentive sensing is provided by an array of wide-angle cameras, while attentive sensing is achieved with a high-resolution, narrow field-of-view camera and a mirror-based gaze deflection system. Quantitative evaluation on a novel dataset shows that this attentive sensing strategy can yield good panoramic face recognition performance in the wild out to distances of ~35m.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132180218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge-based Visual Context-Aware Framework for Applications in Robotic Services","authors":"Doosoo Chang, Bohyung Han","doi":"10.1109/WACVW58289.2023.00012","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00012","url":null,"abstract":"Recently, context awareness in vision technologies has become essential with the increasing demand for real-world applications, such as surveillance systems and service robots. However, implementing context awareness with an end-to-end learning-based system limits its extensibility and performance because the context varies in scope and type, but related data are mostly rare. To mitigate these limitations, we propose a visual context-aware frame-work composed of independent processes of visual perception and context inference. The framework performs logical inferences using the abstracted visual information of recognized objects and relationships based on our knowledge representation. We demonstrate the scalability and utility of the proposed framework through experimental cases that present stepwise context inferences applied to robotic services in different domains.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131756187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hsiang-Wei Huang, Cheng-Yen Yang, Samartha Ramkumar, Chung-I Huang, Jenq-Neng Hwang, Pyong-Kun Kim, Kyoungoh Lee, Kwang-Ik Kim
{"title":"Observation Centric and Central Distance Recovery for Athlete Tracking","authors":"Hsiang-Wei Huang, Cheng-Yen Yang, Samartha Ramkumar, Chung-I Huang, Jenq-Neng Hwang, Pyong-Kun Kim, Kyoungoh Lee, Kwang-Ik Kim","doi":"10.1109/WACVW58289.2023.00050","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00050","url":null,"abstract":"Multi-Object Tracking on humans has improved rapidly with the development of object detection and re-identification algorithms. However, multi-actor tracking over humans with similar appearance and non-linear movement can still be very challenging even for the state-of-the-art tracking algorithm. Current motion-based tracking algorithms often use Kalman Filter to predict the motion of an object, however, its linear movement assumption can cause failure in tracking when the target is not moving linearly. And for multi-player tracking over the sports field, because the players on the same team are usually wearing the same color of jersey, making re-identification even harder both in the short term and long term in the tracking process. In this work, we proposed a motion-based tracking algorithm and three post-processing pipelines for three sports including basketball, football, and volleyball, we successfully handle the tracking of the non-linear movement of players on the sports fields. Experimental results achieved a HOTA of 73.968 on the testing set of ECCV DeeperAction Challenge SportsMOT Dataset and a HOTA of 49.97 on the McGill HPTDataset, showing the effectiveness of the proposed framework and its robustness in different sports including basketball, football, hockey, and volleyball.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127171578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pietro Melzi, H. O. Shahreza, C. Rathgeb, Rubén Tolosana, R. Vera-Rodríguez, Julian Fierrez, S. Marcel, C. Busch
{"title":"Multi-IVE: Privacy Enhancement of Multiple Soft-Biometrics in Face Embeddings","authors":"Pietro Melzi, H. O. Shahreza, C. Rathgeb, Rubén Tolosana, R. Vera-Rodríguez, Julian Fierrez, S. Marcel, C. Busch","doi":"10.1109/WACVW58289.2023.00036","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00036","url":null,"abstract":"This study focuses on the protection of soft-biometric at-tributes related to the demographic information of individ-uals that can be extracted from compact representations of face images, called embeddings. We consider a state-of-the-art technology for soft-biometric privacy enhancement, Incremental Variable Elimination (IVE), and propose Multi-IVE, a new method based on IVE to secure multiple soft-biometric attributes simultaneously. Several aspects of this technology are investigated, proposing different approaches to effectively identify and discard multiple soft-biometric at-tributes contained in face embeddings. In particular, we consider a domain transformation using Principle component Analysis (PCA), and apply IVE in the PCA domain. A complete analysis of the proposed Multi-IVE algorithm is carried out studying the embeddings generated by state-of-the-art face feature extractors, predicting soft-biometric attributes contained within them with multiple machine learning classifiers, and providing a cross-database evaluation. The results obtained show the possibility to simultane-ously secure multiple soft-biometric attributes and support the application of embedding domain transformations be-fore addressing the enhancement of soft-biometric privacy.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130320426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Approach for Underwater Image Improvement: Deblurring, Dehazing, and Color Correction","authors":"Alejandro Rico Espinosa, Declan McIntosh, A. Albu","doi":"10.1109/WACVW58289.2023.00026","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00026","url":null,"abstract":"As remotely operated underwater vehicles (ROV) and static underwater video and image collection platforms become more prevalent, there is a significant need for effective ways to increase the quality of underwater images at faster than real-time speeds. To this end, we present a novel state-of-the-art end-to-end deep learning architecture for underwater image enhancement focused on solving key image degradations related to blur, haze, and color casts and inference efficiency. Our proposed architecture builds from a minimal encoder-decoder structure to address these main underwater image degradations while maintaining efficiency. We use the discrete wavelet transform skip connections and channel attention modules to address haze and color corrections while preserving model efficiency. Our minimal architecture operates at 40 frames per second while scoring a structural similarity index (SSIM) of 0.8703 on the underwater image enhancement benchmark (UIEDB) dataset. These results show our method to be twice as fast as the previous state-of-the-art. We also present a variation of our proposed method with a second parallel deblurring branch for even more significant image improvement, which achieves an improved SSIM of 0.8802 while operating more efficiently than almost all comparable methods. The source code is available at https://github.com/alejorico98/underwater_ddc","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130846759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}