2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献_第2页

Blood Volume Pulse Signal Extraction based on Spatio-Temporal Low-Rank Approximation for Heart Rate Estimation 基于时空低秩逼近的血容量脉冲信号提取及其心率估计

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008871

Kosuke Kurihara, Y. Maeda, D. Sugimura, T. Hamamoto

{"title":"Blood Volume Pulse Signal Extraction based on Spatio-Temporal Low-Rank Approximation for Heart Rate Estimation","authors":"Kosuke Kurihara, Y. Maeda, D. Sugimura, T. Hamamoto","doi":"10.1109/VCIP56404.2022.10008871","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008871","url":null,"abstract":"We propose a novel blood volume pulse (BVP) signal extraction method for heart rate estimation that incorporates the self-similarity properties of BVP in the spatial and temporal domains. The main novelty of the proposed method is the incorporation of the temporal self-similarity of BVP via low-rank approximation in the time-delay coordinate system for BVP signal extraction. To make a low-rank approximation of BVP in the time domain, we introduce knowledge of linear time-invariant systems, i.e., the autoregressive (AR) model lies in the low-rank subspace in the time-delay coordinate system. In the medical field, it is widely known that BVP has quasi-periodic temporal characteristics owing to the cardiac pulse and exhibits self-similarity properties in the temporal domain. Hence, we model the temporal behavior of BVP as an AR process, allowing for a low-rank approximation of BVP in the time-delay coordinate system. Low-rank approximation of BVP in the time and spatial domains enables reliable BVP signal extraction, resulting in accurate heart rate estimation. The experiments demonstrate the effectiveness of the proposed method.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121062673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multi-information Aggregation Network for Fundus Image Quality Assessment 眼底图像质量评价的多信息聚合网络

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008858

Yuan-Fang Li, Guanghui Yue, Lvyin Duan, Honglv Wu, Tianfu Wang

{"title":"Multi-information Aggregation Network for Fundus Image Quality Assessment","authors":"Yuan-Fang Li, Guanghui Yue, Lvyin Duan, Honglv Wu, Tianfu Wang","doi":"10.1109/VCIP56404.2022.10008858","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008858","url":null,"abstract":"Fundus image quality assessment (IQA) is essential for controlling the quality of retinal imaging and guaranteeing the reliability of diagnoses by ophthalmologists. Existing fundus IQA methods mainly explore local information to consider local distortions from convolutional neural networks (CNNs), yet ignoring global distortions. In this paper, we propose a novel multi-information aggregation network, termed MA-Net, for fundus IQA by extracting both local and global information. Specifically, MA-Net adopts an asymmetric dual-branch structure. For an input image, it uses the ResNet50 and vision transformer (ViT) to obtain the local and global representations from the upper and lower branches, respectively. In addition, MA-Net separately feed different images into the two branches to rank their quality for supplementing the feature representations. Thanks to the exploration of intra- and inter-class information between images, our MA-Net is competent for the fundus IQA task. Experiment results on the EyeQ dataset show that our MA-Net outperforms the baselines (i.e., ResNet50 and ViT) by 3.06% and 7.61% in Acc, and is superior to the mainstream methods.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133962090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic Mesh Commonality Modeling Using the Cuboidal Partitioning 基于立方体划分的动态网格通用建模

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008903

Ashek Ahmmed, M. Paul, M. Murshed, M. Pickering

{"title":"Dynamic Mesh Commonality Modeling Using the Cuboidal Partitioning","authors":"Ashek Ahmmed, M. Paul, M. Murshed, M. Pickering","doi":"10.1109/VCIP56404.2022.10008903","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008903","url":null,"abstract":"For 3D object representation, volumetric contents like meshes and point clouds provide suitable formats. However, a dynamic mesh sequence may require significantly large amount of data because it consists of information that varies with time. Hence, for the facilitation of storage and transmission of such content, efficient compression technologies are required. MPEG has started standardization activities aiming to develop a mesh compression standard that would be able to handle dynamic meshes with time varying connectivity information and time varying attribute maps. The attribute maps are features associated with the mesh surface and stored as 2D images/videos. In this paper, we propose to capture the commonality information in the dynamic mesh attribute maps using the cuboidal partitioning algorithm. This algorithm is capable of modeling both the global and local commonality within an image in a compact and computationally efficient way. Experimental results show that the proposed approach can outperform the anchor HEVC codec, suggested by MPEG to encode such sequences, with a bit rate savings of up to 3.66%.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133030704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spectral Analysis of Aerial Light Field for Optimization Sampling and Rendering of Unmanned Aerial Vehicle 面向无人机优化采样与绘制的航空光场光谱分析

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008878

Qiuming Liu, Yichen Wang, Ying Wei, Lei Xie, Changjian Zhu, Ruoxuan Zhou

引用次数: 0

Video Quality Assessment based on Quality Aggregation Networks 基于质量聚合网络的视频质量评估

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008817

Wei Wo, Yingxue Zhang, Yaosi Hu, Zhenzhong Chen, Shan Liu

{"title":"Video Quality Assessment based on Quality Aggregation Networks","authors":"Wei Wo, Yingxue Zhang, Yaosi Hu, Zhenzhong Chen, Shan Liu","doi":"10.1109/VCIP56404.2022.10008817","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008817","url":null,"abstract":"A reliable video quality assessment (VQA) algorithm is essential for evaluating and optimizing video processing pipelines. In this paper, we propose a quality aggregation network (QAN) for full-reference VQA, which models the characteristics of human visual perception of video quality in both spatial and temporal domain. The proposed QAN is composed of two mod-ules, the spatial quality aggregation (SQA) network and the tem-poral quality aggregation (TQA) network. Specifically, the SQA network models the quality of video frames using 3D CNN, taking both spatial and temporal masking effects into consideration for the modeling of the perception of human visual system (HVS). In the TQA network, considering the memory effect of HVS facing the temporal variation of frame-level quality, an LSTM-based temporal quality pooling network is proposed to capture the nonlinearities and temporal dependencies involved in the process of quality evaluation. According to the experimental results on two well-established VQA databases, the proposed model could outperform the state-of-the-art metrics. The code of the proposed method is available at: https://github.com/lorenzowu/QAN.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129946720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Azimuth Adjustment Considering LiDAR Calibration for the Predictive Geometry Compression in G-PCC 考虑激光雷达标定的G-PCC预测几何压缩方位角调整

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008828

Youguang Yu, Wei Zhang, Fuzheng Yang

{"title":"Azimuth Adjustment Considering LiDAR Calibration for the Predictive Geometry Compression in G-PCC","authors":"Youguang Yu, Wei Zhang, Fuzheng Yang","doi":"10.1109/VCIP56404.2022.10008828","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008828","url":null,"abstract":"Point clouds captured by spinning Light Detection And Ranging (LiDAR) devices have played a significant role in many applications. To efficiently store and transmit such a huge amount of data, MPEG designed the Geometry-based Point Cloud Compression (G-PCC) standard, which includes a dedicated Pre-dictive Profile for LiDAR point clouds. In this scheme, Cartesian coordinates are mapped to the spherical representation using the elevation-related LiDAR calibration parameters to better characterize the spherical acquisition pattern of the spinning LiDAR device. As such, stronger spatial correlations exist among neighbouring nodes in the predictive structure, resulting in higher compression efficiency. However, it should be mentioned that the azimuth-related calibration parameters, which are unused, also impact the accuracy and correctness of the mapped spherical coordinates. In this paper, an azimuth adjustment method is proposed taking into account this impact. Experimental results show that the proposed azimuth adjustment can consistently improve the coding efficiency of G-PCC. Furthermore, a LiDAR calibration parameter estimation method is proposed in case the azimuth-related parameters are absent. Results show that the proposed calibration parameter estimation can precisely approximate the ground truth.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129543073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ultra-High Resolution Image Segmentation with Efficient Multi-Scale Collective Fusion

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008877

Guohao Sun, Haibin Yan

引用次数: 1

On Data Annotation Efficiency for Image Based Crowd Counting 基于图像的人群计数的数据标注效率研究

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008825

Tianfang Ma, Shuoyan Liu, Qian Wang

{"title":"On Data Annotation Efficiency for Image Based Crowd Counting","authors":"Tianfang Ma, Shuoyan Liu, Qian Wang","doi":"10.1109/VCIP56404.2022.10008825","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008825","url":null,"abstract":"Crowd counting aims at automatically estimating the number of persons in still images. It has attracted much attention due to its potential usage in surveillance, intelligent transportation and many other scenarios. In the recent decade, most researchers have been focusing on the design of novel deep learning models for improved crowd counting performance. Such attempts include proposing advanced architectures of deep neural networks, using different training strategies and loss functions. Other than the capabilities of models, the crowd counting performance is also determined by the quantity and the quality of training data. Whilst the deep models are data-hungry and better performance can usually be expected with more training data, annotating images for training is time-consuming and expensive in real-world applications. In this work, we focus on the efficiency of data annotation for crowd counting. By varying the number of annotated images and the number of annotated points (one point is annotated per person head) for training, our experimental results demonstrate it is more efficient to annotate a small number of points per image across a large number of images for training. Based on this conclusion, we present a novel adaptive scaling mechanism for data augmentation to diversify the training images without extra annotation cost. The mechanism is proved effective via thorough experiments.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"488 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116193351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Mining Regional Relation from Pixel-wise Annotation for Scene Parsing 面向场景分析的逐像素标注区域关系挖掘

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008859

Zichen Song, Hongliang Li, Heqian Qiu, Xiaoliang Zhang

{"title":"Mining Regional Relation from Pixel-wise Annotation for Scene Parsing","authors":"Zichen Song, Hongliang Li, Heqian Qiu, Xiaoliang Zhang","doi":"10.1109/VCIP56404.2022.10008859","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008859","url":null,"abstract":"Scene parsing is an important and challenging task in computer vision, which assigns semantic labels to each pixel in the entire scene. Existing scene parsing methods only utilize pixel-wise annotation as the supervision of neural network, thus, some similar categories are easy to be misclassified in the complex scenes without the utilization of regional relation. To tackle these above challenging problems, a Regional Relation Network (RRNet) is proposed in this paper, which aims to boost the scene parsing performance by mining regional relation from pixel-wise annotation. Specifically, the pixel-wise annotation is divided into a lot of fixed regions, so that intra- and inter-regional relation are able to be extracted as the supervision of network. We firstly design an intra-regional relation module to predict category distribution in each fixed region, which is helpful for reducing the misclassification phenomenon in regions. Secondly, an inter-regional relation module is proposed to learn the relationships among each region in scene images. With the guideline of relation information extracted from the ground truth, the network is able to learn more discriminative relation representations. To validate our proposed model, we conduct experiments on three typical datasets, including NYU-depth-v2, PASCAL-Context and ADE20k. The achieved competitive results on all three datasets demonstrate the effectiveness of our method.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"728 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126948980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BAM: A Bidirectional Attention Module for Masked Face Recognition 一种用于蒙面人脸识别的双向注意模块

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008847

M. S. Shakeel

{"title":"BAM: A Bidirectional Attention Module for Masked Face Recognition","authors":"M. S. Shakeel","doi":"10.1109/VCIP56404.2022.10008847","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008847","url":null,"abstract":"Masked Face Recognition (MFR) is a recent addition to the directory of existing challenges in facial biometrics. Due to the limited exposure of facial regions due to mask-occlusion, it is essential to exploit the available non-occluded regions as much as possible for identity feature learning. Aiming to address this issue, we propose a dual-branch bidirectional attention module (BAM), which consists of a spatial attention block (SAB) and a channel attention block (CAB) in each branch. In the first stage, the SAB performs bidirectional interactions between the original feature map and its augmented version to highlight informative spatial locations for feature learning. The learned bidirectional spatial attention maps are then passed through a channel attention block (CAB) to assign high weights to only informative feature channels. Finally, the channel-wise calibrated feature responses are fused to generate a final attention-aware feature representation for MFR. Extensive experiments indicate that our proposed BAM is superior to various state-of-the-art methods in terms of recognizing mask-occluded face images under complex facial variations.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134103888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1