2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献

筛选
英文 中文
STSI: Efficiently Mine Spatio- Temporal Semantic Information between Different Multimodal for Video Captioning 高效挖掘多模态视频字幕的时空语义信息
2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008808
Huiyu Xiong, Lanxiao Wang
{"title":"STSI: Efficiently Mine Spatio- Temporal Semantic Information between Different Multimodal for Video Captioning","authors":"Huiyu Xiong, Lanxiao Wang","doi":"10.1109/VCIP56404.2022.10008808","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008808","url":null,"abstract":"As one of the challenging tasks in computer vision, video captioning needs to use natural language to describe the content of video. Video contains complex information, such as semantic information, time information and so on. How to synthesize sentences effectively from rich and different kinds of information is very significant. The existing methods often cannot well integrate the multimodal feature to predict the association between different objects in video. In this paper, we improve the existing encoder-decoder structure and propose a network deeply mining the spatio-temporal correlation between multimodal features. Through the analysis of sentence components, we use spatio-temporal semantic information mining module to fuse the object, 2D and 3D features in both time and space. It is worth mentioning that the word output at the previous time is added as the prediction branch of auxiliary conjunctions. After that, a dynamic gumbel scorer is used to output caption sentences that are more consistent with the facts. The experimental results on two benchmark datasets show that our STSI is superior to the state-of-the-art methods while generating more reasonable and semantic-logical sentences.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123002008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-reference Stereoscopic Image Quality Assessment Based on Parallel Multi-scale Perception 基于并行多尺度感知的无参考立体图像质量评价
2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008875
Ziyi Zhang, Sumei Li
{"title":"No-reference Stereoscopic Image Quality Assessment Based on Parallel Multi-scale Perception","authors":"Ziyi Zhang, Sumei Li","doi":"10.1109/VCIP56404.2022.10008875","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008875","url":null,"abstract":"With the rapid development of 3D technologies, effective no-reference stereoscopic image quality assessment (NR-SIQA) methods are in great demand. In this paper, we propose a parallel multi-scale feature extraction convolution neural network (CNN) model combined with novel binocular feature interaction consistent with human visual system (HVS). In order to simulate the characteristics of HVS sensing multi-scale information at the same time, parallel multi-scale feature extraction module (PMSFM) followed by compensation information is proposed. And modified convolutional block attention module (MCBAM) with less computational complexity is designed to generate visual attention maps for the multi-scale features extracted by the PMSFM. In addition, we employ cross-stacked strategy for multi-level binocular fusion maps and binocular disparity maps to simulate the hierarchical perception characteristics of HVS. Experimental results show that our method is superior to the state-of-the-art metrics and achieves an excellent performance.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131446027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Learning-based Approach for Martian Image Compression 基于学习的火星图像压缩方法
2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008891
Qing Ding, Mai Xu, Shengxi Li, Xin Deng, Qiu Shen, Xin Zou
{"title":"A Learning-based Approach for Martian Image Compression","authors":"Qing Ding, Mai Xu, Shengxi Li, Xin Deng, Qiu Shen, Xin Zou","doi":"10.1109/VCIP56404.2022.10008891","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008891","url":null,"abstract":"For the scientific exploration and research on Mars, it is an indispensable step to transmit high-quality Martian images from distant Mars to Earth. Image compression is the key technique given the extremely limited Mars-Earth bandwidth. Recently, deep learning has demonstrated remarkable performance in natural image compression, which provides a possibility for efficient Martian image compression. However, deep learning usually requires large training data. In this paper, we establish the first large-scale high-resolution Martian image compression (MIC) dataset. Through analyzing this dataset, we observe an important non-local self-similarity prior for Marian images. Benefiting from this prior, we propose a deep Martian image compression network with the non-local block to explore both local and non-local dependencies among Martian image patches. Experimental results verify the effectiveness of the proposed network in Martian image compression, which outperforms both the deep learning based compression methods and HEVC codec.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122093323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Fast Motion Estimation Method With Hamming Distance for LiDAR Point Cloud Compression 一种基于汉明距离的激光雷达点云压缩快速运动估计方法
2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008842
Yuhao An, Yiting Shao, Ge Li, Wei Gao, Shan Liu
{"title":"A Fast Motion Estimation Method With Hamming Distance for LiDAR Point Cloud Compression","authors":"Yuhao An, Yiting Shao, Ge Li, Wei Gao, Shan Liu","doi":"10.1109/VCIP56404.2022.10008842","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008842","url":null,"abstract":"With more three-dimensional space information, Light detection and ranging (LiDAR) point clouds, which are promising to play more roles in the future, have an urgent need to be efficiently compressed. There are lots of compression methods based on spatial correlations, whereas few studies consider exploiting temporal correlations. In this paper, we propose a different perspective for the motion estimation. In most previous works, geometric distance between matching points was used as the criterion, which has an expensive computational cost and is not accurate. We first propose the Hamming distance between the octree's nodes, instead of the geometric distance between per point which is a more direct criterion. We have implemented our method in the MPEG (Moving Picture Expert Group) Geometry-based PCC (Point Cloud Compression) inter-exploration (G-PCC Inter-EM). Experimental results show our method can provide the average 3.5 % bitrate savings and 92.5 % encoding speed increase in lossless geometric coding, compared to the G-PCC Inter-EM.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115316715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Frequency-aware Learned Image Compression for Quality Scalability 频率感知学习图像压缩的质量可扩展性
2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008818
Hyomin Choi, Fabien Racapé, Shahab Hamidi-Rad, Mateen Ulhaq, Simon Feltman
{"title":"Frequency-aware Learned Image Compression for Quality Scalability","authors":"Hyomin Choi, Fabien Racapé, Shahab Hamidi-Rad, Mateen Ulhaq, Simon Feltman","doi":"10.1109/VCIP56404.2022.10008818","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008818","url":null,"abstract":"Spatial frequency analysis and transforms serve a central role in most engineered image and video lossy codecs, but are rarely employed in neural network (NN)-based approaches. We propose a novel NN-based image coding framework that utilizes forward wavelet transforms to decompose the input signal by spatial frequency. Our encoder generates separate bitstreams for each latent representation of low and high frequencies. This enables our decoder to selectively decode bitstreams in a quality-scalable manner. Hence, the decoder can produce an enhanced image by using an enhancement bitstream in addition to the base bitstream. Furthermore, our method is able to enhance only a specific region of interest (ROI) by using a corresponding part of the enhancement latent representation. Our experiments demonstrate that the proposed method shows competitive rate-distortion performance compared to several non-scalable image codecs. We also showcase the effectiveness of our two-level quality scalability, as well as its practicality in ROI quality enhancement.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124901426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CFNet: A Coarse-to-Fine Network for Few Shot Semantic Segmentation CFNet:一种小镜头语义分割的粗到精网络
2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008845
Jiade Liu, Cheolkon Jung
{"title":"CFNet: A Coarse-to-Fine Network for Few Shot Semantic Segmentation","authors":"Jiade Liu, Cheolkon Jung","doi":"10.1109/VCIP56404.2022.10008845","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008845","url":null,"abstract":"Since a huge amount of datasets is required for semantic segmentation, few shot semantic segmentation has attracted more and more attention of researchers. It aims to achieve semantic segmentation for unknown categories from only a small number of annotated training samples. Existing models for few shot semantic segmentation directly generate segmentation results and concentrate on learning the relationship between pixels, thus ignoring the spatial structure of features and decreasing the model learning ability. In this paper, we propose a coarse-to-fine network for few shot semantic segmentation, named CFNet. Firstly, we design a region selection module based on prototype learning to select the approximate region corresponding to the unknown category of the query image. Secondly, we elaborately combine the attention mechanism with the convolution module to learn the spatial structure of features and optimize the selected region. For the attention mechanism, we combine channel attention with self-attention to enhance the model ability of exploring the spatial structure of features and the pixel-wise relationship between support and query images. Experimental results show that CFNet achieves 65.2% and 70.1% in mean-IoU (mIoU) on PASCAL-5i for 1-shot and 5-shot settings, respectively, and outperforms state-of-the-art methods by 1.0%.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124444725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blind Gaussian Deep Denoiser Network using Multi-Scale Pixel Attention 基于多尺度像素关注的盲高斯深度去噪网络
2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008856
Ramesh Kumar Thakur, S. K. Maji
{"title":"Blind Gaussian Deep Denoiser Network using Multi-Scale Pixel Attention","authors":"Ramesh Kumar Thakur, S. K. Maji","doi":"10.1109/VCIP56404.2022.10008856","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008856","url":null,"abstract":"Many deep learning networks focus on the task of Gaussian denoising by processing images on a fixed scale or multiple scales using convolution and deconvolution. In certain cases, excessive scaling applied in the network results in the loss of image details. Sometimes, the usage of deeper convolutional networks results in the loss of network gradient. In this paper, to overcome both the problems, we propose a multi-scale pixel attention-based blind Gaussian denoiser network that utilizes a combination of important features at five different scales. The proposed network performs blind Gaussian denoising in the sense that it does not need any prior information about noise. It comprises a central multi-scale pixel attention block together with dilated convolutional layers and skip connections that help in utilizing the full receptive field of the first convolutional layer to the last convolutional layer and is based on residual architecture for propagating high-level information easily in the network. We have provided the code of the proposed technique at https://github.com/RTSIR/MSPABDN.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131564397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No Reference Stereoscopic Video Quality Assessment based on Human Vision System 基于人眼视觉系统的无参考立体视频质量评价
2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008866
Xiaofang Zhang, Sumei Li
{"title":"No Reference Stereoscopic Video Quality Assessment based on Human Vision System","authors":"Xiaofang Zhang, Sumei Li","doi":"10.1109/VCIP56404.2022.10008866","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008866","url":null,"abstract":"In this paper, we propose a no-reference stereoscopic video quality assessment (NR-SVQA) based on human vision system (HVS). Firstly, we build a frequency transform module (FTM), which maps spatial domain to frequency domain by cosine discrete transform (DCT), and selects important frequency components through channel attention mechanism. Secondly, we use dynamic convolution to regionally process the same input. Thirdly, we use convolutional long short term memory (Conv-LSTM) to extract spatio-temporal information rather than just temporal information. Finally, in order to better simulate the visual characteristics of human eyes, we build a optic chiasm module. The experiment results show that our method outperforms any other methods.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131028001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Gaussian Distribution Based Distortion Model for the H.266/VVC Video Coder 基于广义高斯分布的H.266/VVC视频编码器失真模型
2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008905
Hongkui Wang, Junhui Liang, Li Yu, Y. Gu, Haibing Yin
{"title":"Generalized Gaussian Distribution Based Distortion Model for the H.266/VVC Video Coder","authors":"Hongkui Wang, Junhui Liang, Li Yu, Y. Gu, Haibing Yin","doi":"10.1109/VCIP56404.2022.10008905","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008905","url":null,"abstract":"In versatile video coding (VVC), superior coding performance is achieved with incorporating many advanced coding tools. In this paper, a frame-level coding distortion model is proposed for VVC video coders for the first time. In comparison with the transform coefficient distribution (TCD) of High Effective Video Coding (HEVC), the TCD of VVC has a sharper peak. According to this observation, the TCDs of I, B and P frames are modeled by the probability density function (PDF) of generalized Gaussian distribution (GGD) with three fixed shape parameters. The GGD-based distortion model is then derived with a sliding window-based strategy, i.e., the frame-level coding distortion is formulated as the function of the distribution parameter of frame-level TCD and the quantization step. The experimental results show that the proposed model achieves accurate results of distortion estimation for VVC coders.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115410518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CdCLR: Clip-Driven Contrastive Learning for Skeleton-Based Action Recognition 基于骨架的动作识别的片段驱动对比学习
2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008837
Rong Gao, Xin Liu, Jingyu Yang, Huanjing Yue
{"title":"CdCLR: Clip-Driven Contrastive Learning for Skeleton-Based Action Recognition","authors":"Rong Gao, Xin Liu, Jingyu Yang, Huanjing Yue","doi":"10.1109/VCIP56404.2022.10008837","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008837","url":null,"abstract":"In this study, we propose a Clip-Driven Contrastive Learning for Skeleton-Based Action Recognition (CdCLR). In-stead of considering sequences as instances, CdCLR extracts clips from the sequences as new instances. Aim to implement inherent supervision-guided contrastive learning through joint optimal training of sequences discrimination, clips discrimination, and order verification. Mining abundant positive/negative pairs inside sequence while learning inter-and intra-sequence semantic repre-sentations. Extensive experiments on the NTU RGB+D 60, UCLA and iMiGUE datasets present that CdCLR exhibits superior performance under various evaluation protocols and reaches state-of-the-art. Our code is available at https://github.com/Erich-G/CdCLRI.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"60 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114025618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信