{"title":"Deep Learning Classification of Large-Scale Point Clouds: A Case Study on Cuneiform Tablets","authors":"Frederik Hagelskjær","doi":"10.1109/ICIP46576.2022.9898032","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9898032","url":null,"abstract":"This paper introduces a novel network architecture for the classification of large-scale point clouds. The network is used to classify metadata from cuneiform tablets. As more than half a million tablets remain unprocessed, this can help create an overview of the tablets. The network is tested on a comparison dataset and obtains state-of-the-art performance. We also introduce new metadata classification tasks on which the network shows promising results. Finally, we introduce the novel Maximum Attention visualization, demonstrating that the trained network focuses on the intended features. Code available at https://github.com/fhagelskjaer/dlc-cuneiform","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115432825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingbo Miao, Yanwei Liu, Kan Wang, Jinxia Liu, Zhen Xu
{"title":"Variational Depth Estimation on Hypersphere for Panorama","authors":"Jingbo Miao, Yanwei Liu, Kan Wang, Jinxia Liu, Zhen Xu","doi":"10.1109/ICIP46576.2022.9897914","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897914","url":null,"abstract":"Depth estimation for panorama is a key part of 3D scene understanding, and adopting discriminative models is the most common solution. However, due to the rectangular convolution kernel, these existing learning methods cannot efficiently extract the distorted features in panoramas. To this end, we propose OmniVAE, a generative model based on Conditional Variational Auto-Encoder (CVAE) and von Mises-Fisher (vMF) distribution, to strengthen the exclusive generative ability for spherical signals by mapping panoramas to hypersphere space. Further, to alleviate the side effects of manifold-mismatching caused by non-planar distribution, we put forward the Atypical Receptive Field (ARF) module to slightly shift the receptive field of the network and even take the distribution difference into account in the reconstruction loss. The quantitative and qualitative evaluations are performed on real-world and synthetic datasets, and the results show that OmniVAE outperforms the state-of-the-art methods.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115466924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Regularization for Retinex Decomposition of Low-Light Images","authors":"Arthur Lecert, R. Fraisse, A. Roumy, C. Guillemot","doi":"10.1109/ICIP46576.2022.9897893","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897893","url":null,"abstract":"We study unsupervised Retinex decomposition for low light image enhancement. Being an underdetermined problem with infinite solutions, well-suited priors are required to reduce the solution space. In this paper, we analyze the characteristics of low-light images and their illumination component and identify a trivial solution not taken into consideration by the previous unsupervised state-of-the-art methods. The challenge comes from the fact that the trivial solution cannot be completely eliminated from the feasible set as it corresponds to the true solution when the low-light image contains a light source or an overexposed area. To address this issue, we propose a new regularization term which only remove absurd solutions and keep plausible ones in the set. To demonstrate the efficiency of the proposed prior, we conduct our experiments using deep image priors in a framework similar to the recent work RetinexDIP and an in-depth ablation study. Finally, we observe no more halo artefacts in the restored image. For all-but-one metrics, our unsupervised approach gives results as good as the supervised state-of-the-art indicating the potential of this framework for low-light image enhancement.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115883934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongyun Lin, Yiqun Li, Yi Cheng, S. Prasad, Aiyuan Guo
{"title":"Masked Face Recognition via Self-Attention Based Local Consistency Regularization","authors":"Dongyun Lin, Yiqun Li, Yi Cheng, S. Prasad, Aiyuan Guo","doi":"10.1109/ICIP46576.2022.9898076","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9898076","url":null,"abstract":"With the COVID-19 pandemic, one critical measure against infection is wearing masks. This measure poses a huge challenge to the existing face recognition systems by introducing heavy occlusions. In this paper, we propose an effective masked face recognition system. To alleviate the challenge of mask occlusion, we first exploit RetinaFace to achieve robust masked face detection and alignment. Secondly, we propose a deep CNN network for masked face recognition trained by minimizing ArcFace loss together with a local consistency regularization (LCR) loss. This facilitates the network to simultaneously learn globally discriminative face representations of different identities together with locally consistent representations between the non-occluded faces and their counterparts wearing synthesized facial masks. The experiments on the masked LFW dataset demonstrate that the proposed system can produce superior masked face recognition performance over multiple state-of-the-art methods. The proposed method is implemented in a portable Jetson Nano device which can achieve real-time masked face recognition.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124284531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multi-Source Image Matching Network for UAV Visual Location","authors":"C. Li, Ganchao Liu, Yuan Yuan","doi":"10.1109/ICIP46576.2022.9897631","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897631","url":null,"abstract":"Visual localization is an important but challenging task for unmanned aerial vehicles (UAV). Matching real-time UAV orthophotos to pre-existing georeferenced satellite images is the key problem for this task. However, UAV and satellite images are inconsistent in image styles, perspectives, and times. In this paper, a new fully convolutional siamese network is proposed to extract similar features for multi-source images. The Squeeze-and-Excitation structure is integrated into the densely connected network to adapt to multi-scale features and the texture differences of different regions. Besides, a loss function with a progressive sampling strategy is utilized to mine the similarity of matching multi-source images and improve the description compactness among dimensions. Extensive experimental results with in-depth analysis are provided, which indicate that the proposed framework can significantly improve the matching performance of the learned descriptor.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124342563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Visual Feature and Gaze Driven Egocentric Video Retargeting","authors":"Aneesh Bhattacharya, S. Malladi, J. Mukhopadhyay","doi":"10.1109/ICIP46576.2022.9897908","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897908","url":null,"abstract":"Egocentric vision data has become popular due to its unique way of capturing first-person perspective. However they are lengthy, contain redundant information and visual noise caused by head movements which disrupt the story being expressed through them. This paper proposes a novel visual feature and gaze driven approach to retarget egocentric videos following the principles of cinematography. This approach is divided into two parts: activity based scene detection and performing panning and zooming to produce visually immersive videos. Firstly, visually similar frames are grouped using DCT feature matching followed by SURF descriptor matching. These groups are further refined using the gaze data to generate different scenes and transitions occurring within an activity. Secondly, the mean 2D gaze positions of scenes are used for generating panning windows enclosing 75% of the frame content. This is done for performing zoom-in and zoom-out operations in the detected scenes and transitions respectively. Our approach has been tested on the GTEA and EGTEA gaze plus datasets witnessing an average accuracy of 88.1% and 72% for sub-activity identification and obtaining an average aspect ratio similarity (ARS) score of 0.967 and 0.73; 60% and 42% SIFT similarity index (SSI) respectively. Code available on Github.1","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114368584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Early Pedestrian Intent Prediction via Features Estimation","authors":"Nada Osman, Enrico Cancelli, Guglielmo Camporese, Pasquale Coscia, Lamberto Ballan","doi":"10.1109/ICIP46576.2022.9897636","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897636","url":null,"abstract":"Anticipating human motion is an essential requirement for autonomous vehicles and robots in order to primary guarantee people’s safety. In urban scenarios, they interact with humans, the surrounding environment, and other vehicles relying on several cues to forecast crossing or not crossing intentions. For these reasons, this challenging task is often tackled using both visual and non-visual features to anticipate future actions from 2 s to 1 s earlier the event. Our work primarily aims to revise this standard evaluation protocol to forecast crossing events as early as possible. To this end, we conceive a solution upon an extensively used model for egocentric action anticipation (RU-LSTM), proposing to envision future features, or modalities, that can better infer human intentions using a properly attention-based fusion mechanism. We validate our model against JAAD and PIE datasets and demonstrate that an intent prediction model can benefit from these additional clues for anticipating pedestrians crossing events.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114899446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defining Point Cloud Boundaries Using Pseudopotential Scalar Field Implicit Surfaces","authors":"Ethan Payne, Amanda Fernandez","doi":"10.1109/ICIP46576.2022.9897175","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897175","url":null,"abstract":"Identifying smooth and meaningful object boundaries of noisy 3D point-clouds presents a challenge. Rather than rely on the points of the cloud itself, we identify a smooth implicit surface to represent the boundary of the cloud. By constructing a scalar field using a semantically-informative pseudopotential function, we take an arbitrary-resolution iso-surface and apply standard computer vision morphological transformations and edge detection on 2D slices of the pseudopotential field. When recombined, these slices comprise a new point-cloud representing the 3D boundary of the object as determined by the chosen isosurface. Our method leverages the strength and accessibility of 2D vision tools to identify smooth and semantically significant boundaries of ill-defined 3D objects, and additionally provides a continuous scalar field containing insight regarding the internal structure of the object. Our method enables a powerful and easily implementable pipeline for 3D boundary identification, particularly in domains where natural candidates for pseudopotential functions are already present.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116017055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Melan Vijayaratnam, Marco Cagnazzo, G. Valenzise, Anthony Trioux, M. Kieffer
{"title":"Towards Zero-Latency Video Transmission Through Frame Extrapolation","authors":"Melan Vijayaratnam, Marco Cagnazzo, G. Valenzise, Anthony Trioux, M. Kieffer","doi":"10.1109/ICIP46576.2022.9897958","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897958","url":null,"abstract":"In the past few years, several efforts have been devoted to reduce individual sources of latency in video delivery, including acquisition, coding and network transmission. The goal is to improve the quality of experience in applications requiring real-time interaction. Nevertheless, these efforts are fundamentally constrained by technological and physical limits. In this paper, we investigate a radically different approach that can arbitrarily reduce the overall latency by means of video extrapolation. We propose two latency compensation schemes where video extrapolation is performed either at the encoder or at the decoder side. Since a loss of fidelity is the price to pay for compensating latency arbitrarily, we study the latency-fidelity compromise using three recent video prediction schemes. Our preliminary results show that by accepting a quality loss, we can compensate a typical latency of 100 ms with a loss of 8 dB in PSNR with the best extrapolator. This approach is promising but also suggests that further work should be done in video prediction to pursue zero-latency video transmission.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116267149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Takeuchi, Fei Li, Sho Iwasaki, Jiaqi Ning, Genta Suzuki
{"title":"Unsupervised Domain-Adaptive Person Re-Identification with Multi-Camera Constraints","authors":"S. Takeuchi, Fei Li, Sho Iwasaki, Jiaqi Ning, Genta Suzuki","doi":"10.1109/ICIP46576.2022.9897377","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897377","url":null,"abstract":"Person re-identification is a key technology for analyzing video-based human behavior; however, its application is still challenging in practical situations due to the performance degradation for domains different from those in the training data. Here, we propose an environment-constrained adaptive network for reducing the domain gap. This network refines pseudo-labels estimated via a self-training scheme by imposing multi-camera constraints. The proposed method incorporates person-pair information without person identity labels obtained from the environment into the model training. In addition, we develop a method that appropriately selects a person from the pair that contributes to the performance improvement. We evaluate the performance of the network using public and private datasets and confirm the performance surpasses state-of-the-art methods in domains with overlapping camera views. To the best of our knowledge, this is the first study on domain-adaptive learning with multi-camera constraints that can be obtained in real environments.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116460118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}