Kosuke Kurihara, Y. Maeda, D. Sugimura, T. Hamamoto
{"title":"Blood Volume Pulse Signal Extraction based on Spatio-Temporal Low-Rank Approximation for Heart Rate Estimation","authors":"Kosuke Kurihara, Y. Maeda, D. Sugimura, T. Hamamoto","doi":"10.1109/VCIP56404.2022.10008871","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008871","url":null,"abstract":"We propose a novel blood volume pulse (BVP) signal extraction method for heart rate estimation that incorporates the self-similarity properties of BVP in the spatial and temporal domains. The main novelty of the proposed method is the incorporation of the temporal self-similarity of BVP via low-rank approximation in the time-delay coordinate system for BVP signal extraction. To make a low-rank approximation of BVP in the time domain, we introduce knowledge of linear time-invariant systems, i.e., the autoregressive (AR) model lies in the low-rank subspace in the time-delay coordinate system. In the medical field, it is widely known that BVP has quasi-periodic temporal characteristics owing to the cardiac pulse and exhibits self-similarity properties in the temporal domain. Hence, we model the temporal behavior of BVP as an AR process, allowing for a low-rank approximation of BVP in the time-delay coordinate system. Low-rank approximation of BVP in the time and spatial domains enables reliable BVP signal extraction, resulting in accurate heart rate estimation. The experiments demonstrate the effectiveness of the proposed method.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121062673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan-Fang Li, Guanghui Yue, Lvyin Duan, Honglv Wu, Tianfu Wang
{"title":"Multi-information Aggregation Network for Fundus Image Quality Assessment","authors":"Yuan-Fang Li, Guanghui Yue, Lvyin Duan, Honglv Wu, Tianfu Wang","doi":"10.1109/VCIP56404.2022.10008858","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008858","url":null,"abstract":"Fundus image quality assessment (IQA) is essential for controlling the quality of retinal imaging and guaranteeing the reliability of diagnoses by ophthalmologists. Existing fundus IQA methods mainly explore local information to consider local distortions from convolutional neural networks (CNNs), yet ignoring global distortions. In this paper, we propose a novel multi-information aggregation network, termed MA-Net, for fundus IQA by extracting both local and global information. Specifically, MA-Net adopts an asymmetric dual-branch structure. For an input image, it uses the ResNet50 and vision transformer (ViT) to obtain the local and global representations from the upper and lower branches, respectively. In addition, MA-Net separately feed different images into the two branches to rank their quality for supplementing the feature representations. Thanks to the exploration of intra- and inter-class information between images, our MA-Net is competent for the fundus IQA task. Experiment results on the EyeQ dataset show that our MA-Net outperforms the baselines (i.e., ResNet50 and ViT) by 3.06% and 7.61% in Acc, and is superior to the mainstream methods.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133962090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Mesh Commonality Modeling Using the Cuboidal Partitioning","authors":"Ashek Ahmmed, M. Paul, M. Murshed, M. Pickering","doi":"10.1109/VCIP56404.2022.10008903","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008903","url":null,"abstract":"For 3D object representation, volumetric contents like meshes and point clouds provide suitable formats. However, a dynamic mesh sequence may require significantly large amount of data because it consists of information that varies with time. Hence, for the facilitation of storage and transmission of such content, efficient compression technologies are required. MPEG has started standardization activities aiming to develop a mesh compression standard that would be able to handle dynamic meshes with time varying connectivity information and time varying attribute maps. The attribute maps are features associated with the mesh surface and stored as 2D images/videos. In this paper, we propose to capture the commonality information in the dynamic mesh attribute maps using the cuboidal partitioning algorithm. This algorithm is capable of modeling both the global and local commonality within an image in a compact and computationally efficient way. Experimental results show that the proposed approach can outperform the anchor HEVC codec, suggested by MPEG to encode such sequences, with a bit rate savings of up to 3.66%.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133030704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spectral Analysis of Aerial Light Field for Optimization Sampling and Rendering of Unmanned Aerial Vehicle","authors":"Qiuming Liu, Yichen Wang, Ying Wei, Lei Xie, Changjian Zhu, Ruoxuan Zhou","doi":"10.1109/VCIP56404.2022.10008878","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008878","url":null,"abstract":"The aerial light field (ALF) can render higher quality images of large-scale 3D scenes. In this paper, we apply the ALF technology to study the image captured and novel view rendering of unmanned aerial vehicle (UAV), which exists some problems, such as large scene and depth of field. First, we design an ALF sampling model using spectral analysis of Fourier theory. Based on the ALF sampling model, the exact expression of ALF spectrum is derived. By the spectral support of ALF, we analyze the influence of pitch angle on light field sampling and its bandwidth. Particularly, the bandwidth of ALF can be applied to determine the minimum sampling rate for UAV. Additionally, we design a reconstruction filter that is related to pitch angle to render novel views of UAV. Finally, our experiments show that our sampling and rendering methods can improve the rendering quality of UAV novel view rendering.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114507208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Wo, Yingxue Zhang, Yaosi Hu, Zhenzhong Chen, Shan Liu
{"title":"Video Quality Assessment based on Quality Aggregation Networks","authors":"Wei Wo, Yingxue Zhang, Yaosi Hu, Zhenzhong Chen, Shan Liu","doi":"10.1109/VCIP56404.2022.10008817","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008817","url":null,"abstract":"A reliable video quality assessment (VQA) algorithm is essential for evaluating and optimizing video processing pipelines. In this paper, we propose a quality aggregation network (QAN) for full-reference VQA, which models the characteristics of human visual perception of video quality in both spatial and temporal domain. The proposed QAN is composed of two mod-ules, the spatial quality aggregation (SQA) network and the tem-poral quality aggregation (TQA) network. Specifically, the SQA network models the quality of video frames using 3D CNN, taking both spatial and temporal masking effects into consideration for the modeling of the perception of human visual system (HVS). In the TQA network, considering the memory effect of HVS facing the temporal variation of frame-level quality, an LSTM-based temporal quality pooling network is proposed to capture the nonlinearities and temporal dependencies involved in the process of quality evaluation. According to the experimental results on two well-established VQA databases, the proposed model could outperform the state-of-the-art metrics. The code of the proposed method is available at: https://github.com/lorenzowu/QAN.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129946720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Azimuth Adjustment Considering LiDAR Calibration for the Predictive Geometry Compression in G-PCC","authors":"Youguang Yu, Wei Zhang, Fuzheng Yang","doi":"10.1109/VCIP56404.2022.10008828","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008828","url":null,"abstract":"Point clouds captured by spinning Light Detection And Ranging (LiDAR) devices have played a significant role in many applications. To efficiently store and transmit such a huge amount of data, MPEG designed the Geometry-based Point Cloud Compression (G-PCC) standard, which includes a dedicated Pre-dictive Profile for LiDAR point clouds. In this scheme, Cartesian coordinates are mapped to the spherical representation using the elevation-related LiDAR calibration parameters to better characterize the spherical acquisition pattern of the spinning LiDAR device. As such, stronger spatial correlations exist among neighbouring nodes in the predictive structure, resulting in higher compression efficiency. However, it should be mentioned that the azimuth-related calibration parameters, which are unused, also impact the accuracy and correctness of the mapped spherical coordinates. In this paper, an azimuth adjustment method is proposed taking into account this impact. Experimental results show that the proposed azimuth adjustment can consistently improve the coding efficiency of G-PCC. Furthermore, a LiDAR calibration parameter estimation method is proposed in case the azimuth-related parameters are absent. Results show that the proposed calibration parameter estimation can precisely approximate the ground truth.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129543073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ultra-High Resolution Image Segmentation with Efficient Multi-Scale Collective Fusion","authors":"Guohao Sun, Haibin Yan","doi":"10.1109/VCIP56404.2022.10008877","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008877","url":null,"abstract":"Ultra-high resolution image segmentation has at-tracted increasing attention recently due to its wide applications in various scenarios such as road extraction and urban planning. The ultra-high resolution image facilitates the capture of more detailed information but also poses great challenges to the image understanding system. For memory efficiency, existing methods preprocess the global image and local patches into the same size, which can only exploit local patches of a fixed resolution. In this paper, we empirically analyze the effect of different patch sizes and input resolutions on the segmentation accuracy and propose a multi-scale collective fusion (MSCF) method to exploit information from multiple resolutions, which can be end-to-end trainable for more efficient training. Our method achieves very competitive performance on the widely-used DeepGlobe dataset while training on one single GPU.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121670242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Data Annotation Efficiency for Image Based Crowd Counting","authors":"Tianfang Ma, Shuoyan Liu, Qian Wang","doi":"10.1109/VCIP56404.2022.10008825","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008825","url":null,"abstract":"Crowd counting aims at automatically estimating the number of persons in still images. It has attracted much attention due to its potential usage in surveillance, intelligent transportation and many other scenarios. In the recent decade, most researchers have been focusing on the design of novel deep learning models for improved crowd counting performance. Such attempts include proposing advanced architectures of deep neural networks, using different training strategies and loss functions. Other than the capabilities of models, the crowd counting performance is also determined by the quantity and the quality of training data. Whilst the deep models are data-hungry and better performance can usually be expected with more training data, annotating images for training is time-consuming and expensive in real-world applications. In this work, we focus on the efficiency of data annotation for crowd counting. By varying the number of annotated images and the number of annotated points (one point is annotated per person head) for training, our experimental results demonstrate it is more efficient to annotate a small number of points per image across a large number of images for training. Based on this conclusion, we present a novel adaptive scaling mechanism for data augmentation to diversify the training images without extra annotation cost. The mechanism is proved effective via thorough experiments.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"488 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116193351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Regional Relation from Pixel-wise Annotation for Scene Parsing","authors":"Zichen Song, Hongliang Li, Heqian Qiu, Xiaoliang Zhang","doi":"10.1109/VCIP56404.2022.10008859","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008859","url":null,"abstract":"Scene parsing is an important and challenging task in computer vision, which assigns semantic labels to each pixel in the entire scene. Existing scene parsing methods only utilize pixel-wise annotation as the supervision of neural network, thus, some similar categories are easy to be misclassified in the complex scenes without the utilization of regional relation. To tackle these above challenging problems, a Regional Relation Network (RRNet) is proposed in this paper, which aims to boost the scene parsing performance by mining regional relation from pixel-wise annotation. Specifically, the pixel-wise annotation is divided into a lot of fixed regions, so that intra- and inter-regional relation are able to be extracted as the supervision of network. We firstly design an intra-regional relation module to predict category distribution in each fixed region, which is helpful for reducing the misclassification phenomenon in regions. Secondly, an inter-regional relation module is proposed to learn the relationships among each region in scene images. With the guideline of relation information extracted from the ground truth, the network is able to learn more discriminative relation representations. To validate our proposed model, we conduct experiments on three typical datasets, including NYU-depth-v2, PASCAL-Context and ADE20k. The achieved competitive results on all three datasets demonstrate the effectiveness of our method.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"728 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126948980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BAM: A Bidirectional Attention Module for Masked Face Recognition","authors":"M. S. Shakeel","doi":"10.1109/VCIP56404.2022.10008847","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008847","url":null,"abstract":"Masked Face Recognition (MFR) is a recent addition to the directory of existing challenges in facial biometrics. Due to the limited exposure of facial regions due to mask-occlusion, it is essential to exploit the available non-occluded regions as much as possible for identity feature learning. Aiming to address this issue, we propose a dual-branch bidirectional attention module (BAM), which consists of a spatial attention block (SAB) and a channel attention block (CAB) in each branch. In the first stage, the SAB performs bidirectional interactions between the original feature map and its augmented version to highlight informative spatial locations for feature learning. The learned bidirectional spatial attention maps are then passed through a channel attention block (CAB) to assign high weights to only informative feature channels. Finally, the channel-wise calibrated feature responses are fused to generate a final attention-aware feature representation for MFR. Extensive experiments indicate that our proposed BAM is superior to various state-of-the-art methods in terms of recognizing mask-occluded face images under complex facial variations.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134103888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}