2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献_第6页

STSI: Efficiently Mine Spatio- Temporal Semantic Information between Different Multimodal for Video Captioning 高效挖掘多模态视频字幕的时空语义信息

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008808

Huiyu Xiong, Lanxiao Wang

{"title":"STSI: Efficiently Mine Spatio- Temporal Semantic Information between Different Multimodal for Video Captioning","authors":"Huiyu Xiong, Lanxiao Wang","doi":"10.1109/VCIP56404.2022.10008808","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008808","url":null,"abstract":"As one of the challenging tasks in computer vision, video captioning needs to use natural language to describe the content of video. Video contains complex information, such as semantic information, time information and so on. How to synthesize sentences effectively from rich and different kinds of information is very significant. The existing methods often cannot well integrate the multimodal feature to predict the association between different objects in video. In this paper, we improve the existing encoder-decoder structure and propose a network deeply mining the spatio-temporal correlation between multimodal features. Through the analysis of sentence components, we use spatio-temporal semantic information mining module to fuse the object, 2D and 3D features in both time and space. It is worth mentioning that the word output at the previous time is added as the prediction branch of auxiliary conjunctions. After that, a dynamic gumbel scorer is used to output caption sentences that are more consistent with the facts. The experimental results on two benchmark datasets show that our STSI is superior to the state-of-the-art methods while generating more reasonable and semantic-logical sentences.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123002008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

No-reference Stereoscopic Image Quality Assessment Based on Parallel Multi-scale Perception 基于并行多尺度感知的无参考立体图像质量评价

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008875

Ziyi Zhang, Sumei Li

引用次数: 0

A Learning-based Approach for Martian Image Compression 基于学习的火星图像压缩方法

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008891

Qing Ding, Mai Xu, Shengxi Li, Xin Deng, Qiu Shen, Xin Zou

引用次数: 2

A Fast Motion Estimation Method With Hamming Distance for LiDAR Point Cloud Compression 一种基于汉明距离的激光雷达点云压缩快速运动估计方法

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008842

Yuhao An, Yiting Shao, Ge Li, Wei Gao, Shan Liu

引用次数: 1

Frequency-aware Learned Image Compression for Quality Scalability 频率感知学习图像压缩的质量可扩展性

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008818

Hyomin Choi, Fabien Racapé, Shahab Hamidi-Rad, Mateen Ulhaq, Simon Feltman

引用次数: 0

CFNet: A Coarse-to-Fine Network for Few Shot Semantic Segmentation CFNet:一种小镜头语义分割的粗到精网络

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008845

Jiade Liu, Cheolkon Jung

{"title":"CFNet: A Coarse-to-Fine Network for Few Shot Semantic Segmentation","authors":"Jiade Liu, Cheolkon Jung","doi":"10.1109/VCIP56404.2022.10008845","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008845","url":null,"abstract":"Since a huge amount of datasets is required for semantic segmentation, few shot semantic segmentation has attracted more and more attention of researchers. It aims to achieve semantic segmentation for unknown categories from only a small number of annotated training samples. Existing models for few shot semantic segmentation directly generate segmentation results and concentrate on learning the relationship between pixels, thus ignoring the spatial structure of features and decreasing the model learning ability. In this paper, we propose a coarse-to-fine network for few shot semantic segmentation, named CFNet. Firstly, we design a region selection module based on prototype learning to select the approximate region corresponding to the unknown category of the query image. Secondly, we elaborately combine the attention mechanism with the convolution module to learn the spatial structure of features and optimize the selected region. For the attention mechanism, we combine channel attention with self-attention to enhance the model ability of exploring the spatial structure of features and the pixel-wise relationship between support and query images. Experimental results show that CFNet achieves 65.2% and 70.1% in mean-IoU (mIoU) on PASCAL-5i for 1-shot and 5-shot settings, respectively, and outperforms state-of-the-art methods by 1.0%.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124444725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Blind Gaussian Deep Denoiser Network using Multi-Scale Pixel Attention 基于多尺度像素关注的盲高斯深度去噪网络

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008856

Ramesh Kumar Thakur, S. K. Maji

引用次数: 0

No Reference Stereoscopic Video Quality Assessment based on Human Vision System 基于人眼视觉系统的无参考立体视频质量评价

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008866

Xiaofang Zhang, Sumei Li

引用次数: 0

Generalized Gaussian Distribution Based Distortion Model for the H.266/VVC Video Coder 基于广义高斯分布的H.266/VVC视频编码器失真模型

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008905

Hongkui Wang, Junhui Liang, Li Yu, Y. Gu, Haibing Yin

引用次数: 0

CdCLR: Clip-Driven Contrastive Learning for Skeleton-Based Action Recognition 基于骨架的动作识别的片段驱动对比学习

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI: 10.1109/VCIP56404.2022.10008837

Rong Gao, Xin Liu, Jingyu Yang, Huanjing Yue

引用次数: 0