Mariona Carós, Ariadna Just, S. Seguí, Jordi Vitrià
{"title":"Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR data","authors":"Mariona Carós, Ariadna Just, S. Seguí, Jordi Vitrià","doi":"10.23919/MVA57639.2023.10216191","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216191","url":null,"abstract":"Airborne LiDAR systems have the capability to capture the Earth’s surface by generating extensive point cloud data comprised of points mainly defined by 3D coordinates. However, labeling such points for supervised learning tasks is time-consuming. As a result, there is a need to investigate techniques that can learn from unlabeled data to significantly reduce the number of annotated samples. In this work, we propose to train a self-supervised encoder with Barlow Twins and use it as a pre-trained network in the task of semantic scene segmentation. The experimental results demonstrate that our unsupervised pre-training boosts performance once fine-tuned on the supervised task, especially for under-represented categories.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115157357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Swetha, Rajeshreddy Datla, Vishnu Chalavadi, K. C.
{"title":"MS-VACSNet: A Network for Multi-scale Volcanic Ash Cloud Segmentation in Remote Sensing Images","authors":"G. Swetha, Rajeshreddy Datla, Vishnu Chalavadi, K. C.","doi":"10.23919/MVA57639.2023.10215928","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215928","url":null,"abstract":"The segmentation of volcanic ash clouds in remote sensing images provides valuable insights to study the volcanic deformation, forecasting, tracking, and hazard monitoring. However, the task of delineating the boundary of volcanic eruptions becomes difficult due to non-uniformity in the scale of eruptions across remote sensing images. In this paper, we propose a network for multi-scale volcanic ash clouds segmentation (MS-VACSNet) in remote sensing images. The proposed MS-VACSNet uses U-Net as base line with few improvements in the encoder and decoder sub-networks. Specifically, we employ dilated convolutions to capture the contextual information while delineating volcanic eruptions of different scales. We have conducted experiments on 10 active volcanic regions across the globe using MODIS thermal and infrared images. The experimental results show that our MS-VACSNet achieves an improvement of 5% in dice score compared to state-of-the-art segmentation approaches in segmenting the volcanic ash clouds.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124211786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalizable Solar Irradiation Prediction using Large Transformer Models with Sky Imagery","authors":"Kuber Reddy Gorantla, Aditi Roy","doi":"10.23919/MVA57639.2023.10216081","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216081","url":null,"abstract":"Deployment of solar power system in new locations impose several challenges on the operations of local and regional power grids due to the inherent variation in ground-level solar irradiance. This work proposes a novel real-time solar now-casting methodology for solar irradiance prediction based on deep transfer learning from ground-based sky imagery. Existing approaches use statistical methods or Convolutional Neural Networks for irradiation regression trained for a particular location that cannot be transferred to new locations deploying potentially different imaging sensors. This observation motivated us to introduce a large deep neural network based on Vision Transformers that is generalizable and transferable to different scenarios.The system is developed using multiple years of solar irradiance and sky image recordings in two locations. We captured our own data set in Princeton, NJ, USA and also used open-source ASI16 benchmark dataset captured in Golden, CO, USA. The method is validated against these two locations of diverse geographic, climatic conditions and sensor variation. Results show that the proposed method is robust and highly accurate (85-90% accuracy) for multiple locations deployment with 50% less data requirement from new locations.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132646861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Randomized Time Warping for Action Recognition","authors":"Yutaro Hiraoka, K. Fukui","doi":"10.23919/MVA57639.2023.10216189","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216189","url":null,"abstract":"This paper proposes an enhanced Randomized Time Warping (RTW) using CNN features, termed Deep RTW, for motion recognition. RTW is a general extension of Dynamic Time Warping (DTW), widely used for matching and comparing sequential patterns. The basic idea of RTW is to simultaneously calculate the similarities between many pairs of various warped patterns, i.e. Time elastic (TE) features generated by randomly sampling the sequential pattern while retaining their temporal order. This mechanism enables RTW to treat the changes in motion speed flexibly. However, naive TE feature vectors generated from raw images are not expected to have high discriminative power. Besides, the dimension of TE features can increase depending on the number of concatenated images. To address the limitations, we incorporate CNN features extracted from 2D/3D CNNs into the framework of RTW as input to address this issue. Our framework is very simple but effective and applicable to various types of CNN architecture. Extensive experiment on public motion datasets, Jester and Something-Something V2, supports the advantage of our method over the original CNNs.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134554458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transformer with Task Selection for Continual Learning","authors":"Sheng-Kai Huang, Chun-Rong Huang","doi":"10.23919/MVA57639.2023.10215673","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215673","url":null,"abstract":"The goal of continual learning is to let the models continuously learn the new incoming knowledge without catastrophic forgetting. To address this issue, we propose a transformer-based framework with the task selection module. The task selection module will select corresponding task tokens to assist the learning of incoming samples of new tasks. For previous samples, the selected task tokens can retain the previous knowledge to assist the prediction of samples of learned classes. Compared with the state-of-the-art methods, our method achieves good performance on the CIFAR-100 dataset especially for the testing of the last task to show that our method can better prevent catastrophic forgetting.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130938811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating self-supervised learning for Skin Lesion Classification","authors":"Takumi Morita, X. Han","doi":"10.23919/MVA57639.2023.10215580","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215580","url":null,"abstract":"Skin cancer is one of the most common cancer worldwide, and is growing as a rising global health issue due to the damage of the natural protection from harmful ultraviolet radiation. Early diagnosis and proper treatment even for the deadliest malignant melanoma can greatly increase the survival rate. Thus, computer-aided diagnosis for skin lesions has been actively explored and made remarkable progress in medical practices benefiting from the the great advance of the deep convolution neural networks in vision tasks. However, most studies in skin lesion/cancer recognition and detection focus on reconstructing a robust prediction model with the annotated training samples in a fully-supervised manner, and cannot make full use of the available unlabeled data. This study investigates self-supervised learning using large amount of unlabeled skin lesion images to train a good initial network for representation learning, and transfer the knowledge of the initial model to the supervised skin lesion classification task with small number of annotated samples for enhancing the performance. Specifically, we employ a negative sample-free self-supervised framework by leveraging the interaction learning of the online and target networks for enforcing representative robustness with only positive samples. Moreover, according to the observation of the potential variations in the target skin images, we select the adaptive augmentation methods to produce the transformed positive views for self-supervised learning. Extensive experiments on two benchmark skin lesion datasets demonstrated that the proposed self-supervised pre-training can stably improve the recognition performance with different numbers of the labeled images compared with the baseline models.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132750801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yeongnam Chae, Poulami Raha, Mijung Kim, B. Stenger
{"title":"Age Prediction From Face Images Via Contrastive Learning","authors":"Yeongnam Chae, Poulami Raha, Mijung Kim, B. Stenger","doi":"10.23919/MVA57639.2023.10216074","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216074","url":null,"abstract":"This paper presents a novel approach for accurately estimating age from face images, which overcomes the challenge of collecting a large dataset of individuals with the same identity at different ages. Instead, we leverage readily available face datasets of different people at different ages and aim to extract age-related features using contrastive learning. Our method emphasizes these relevant features while suppressing identity-related features using a combination of cosine similarity and triplet margin losses. We demonstrate the effectiveness of our proposed approach by achieving state-of-the-art performance on two public datasets, FG-NET and MORPH II.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115370006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhe Xu, Yuan Li, Yuhong Li, Songlin Du, T. Ikenaga
{"title":"Hierarchical Spatio-Temporal Neural Network with Displacement Based Refinement for Monocular Head Pose Prediction","authors":"Zhe Xu, Yuan Li, Yuhong Li, Songlin Du, T. Ikenaga","doi":"10.23919/MVA57639.2023.10216167","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216167","url":null,"abstract":"Head pose prediction aims to forecast future head pose given observed sequence, which plays an increasingly important role in human computer interaction, virtual reality, and driver monitoring. However, since there are many moving possibilities, current head pose works, mainly focusing on estimation, fail to provide sufficient temporal information to meet the high demands for accurate predictions. This paper proposes (A) a Spatio-Temporal Encoder (STE), (B) a displacement based offset generating module, and (C) a time step feature aggregation module. The STE extracts spatial information via Transformer and temporal information according to the time order of frames. The displacement based offset generating module utilizes displacement information through a frequency domain process between adjacent frames to generate an offset to refine the prediction result. Furthermore, the time step feature aggregation module integrates time step features based on the information density and hierarchically extracts past motion information as prior knowledge to capture the motion recurrence. Extensive experiments have shown that the proposed network outperforms related methods, achieving a Mean Absolute Error (MAE) of 4.5865° on simple background sequences and 7.1325° on complex background sequences.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115430653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Learning with Group Relation and Individual Action","authors":"Chihiro Nakatani, Hiroaki Kawashima, N. Ukita","doi":"10.23919/MVA57639.2023.10215994","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215994","url":null,"abstract":"This paper proposes a method for group relation learning. Different from related work in which the manual annotation of group activities is required for supervised learning, we propose group relation learning without group activity annotation through recognition of individual action that can be more easily annotated than group activities defined with complex inter-people relationships. Our method extracts features informative for recognizing the action of each person by conditioning the group relation with the location of this person. A variety of experimental results demonstrate that our method outperforms SOTA methods quantitatively and qualitatively on two public datasets.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114423563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pere Gilabert, C. Malagelada, Hagen Wenzek, Jordi Vitrià, S. Seguí
{"title":"Leveraging Embedding Information to Create Video Capsule Endoscopy Datasets","authors":"Pere Gilabert, C. Malagelada, Hagen Wenzek, Jordi Vitrià, S. Seguí","doi":"10.23919/MVA57639.2023.10215919","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215919","url":null,"abstract":"As the field of deep learning continues to expand, it has become increasingly apparent that large volumes of data are needed to train algorithms effectively. This is particularly challenging in the endoscopic capsule field, where obtaining and labeling sufficient data can be expensive and time-consuming. To overcome these challenges, we have developed an automatic method of video selection that uses the diversity of unlabeled videos to identify the most relevant videos for labeling. The findings indicate a significant improvement in performance with the implementation of this new methodology. The system selects relevant and diverse videos, achieving high accuracy in the classification task. This translates to less workload for annotators as they can label fewer videos while maintaining the same accuracy level in the classification task.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114795101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}