2021 International Conference on Visual Communications and Image Processing (VCIP)最新文献

筛选
英文 中文
Multi-Dimension Aware Back Projection Network For Scene Text Detection 用于场景文本检测的多维感知反向投影网络
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675323
Yizhan Zhao, Sumei Li, Yongli Chang
{"title":"Multi-Dimension Aware Back Projection Network For Scene Text Detection","authors":"Yizhan Zhao, Sumei Li, Yongli Chang","doi":"10.1109/VCIP53242.2021.9675323","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675323","url":null,"abstract":"Recently, scene text detection based on deep learning has progressed substantially. Nevertheless, most previous models with FPN are limited by the drawback of sample interpolation algorithms, which fail to generate high-quality up-sampled features. Accordingly, we propose an end-to-end trainable text detector to alleviate the above dilemma. Specifically, a Back Projection Enhanced Up-sampling (BPEU) block is proposed to alleviate the drawback of sample interpolation algorithms. It significantly enhances the quality of up-sampled features by employing back projection and detail compensation. Further-more, a Multi-Dimensional Attention (MDA) block is devised to learn different knowledge from spatial and channel dimensions, which intelligently selects features to generate more discriminative representations. Experimental results on three benchmarks, ICDAR2015, ICDAR2017- MLT and MSRA-TD500, demonstrate the effectiveness of our method.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130606417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DIRECT: Discrete Image Rescaling with Enhancement from Case-specific Textures 直接:离散图像缩放与增强从个案特定的纹理
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675420
Yan-An Chen, Ching-Chun Hsiao, Wen-Hsiao Peng, Ching-Chun Huang
{"title":"DIRECT: Discrete Image Rescaling with Enhancement from Case-specific Textures","authors":"Yan-An Chen, Ching-Chun Hsiao, Wen-Hsiao Peng, Ching-Chun Huang","doi":"10.1109/VCIP53242.2021.9675420","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675420","url":null,"abstract":"This paper addresses image rescaling, the task of which is to downscale an input image followed by upscaling for the purposes of transmission, storage, or playback on heterogeneous devices. The state-of-the-art image rescaling network (known as IRN) tackles image downscaling and upscaling as mutually invertible tasks using invertible affine coupling layers. In particular, for upscaling, IRN models the missing high-frequency component by an input-independent (case-agnostic) Gaussian noise. In this work, we take one step further to predict a case-specific high-frequency component from textures embedded in the downscaled image. Moreover, we adopt integer coupling layers to avoid quantizing the downscaled image. When tested on commonly used datasets, the proposed method, termed DIRECT, improves high-resolution reconstruction quality both subjectively and objectively, while maintaining visually pleasing downscaled images.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133772093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Entropy-based Deep Product Quantization for Visual Search and Deep Feature Compression 基于熵的深度产品量化视觉搜索和深度特征压缩
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675383
Benben Niu, Ziwei Wei, Yun He
{"title":"Entropy-based Deep Product Quantization for Visual Search and Deep Feature Compression","authors":"Benben Niu, Ziwei Wei, Yun He","doi":"10.1109/VCIP53242.2021.9675383","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675383","url":null,"abstract":"With the emergence of various machine-to-machine and machine-to-human tasks with deep learning, the amount of deep feature data is increasing. Deep product quantization is widely applied in deep feature retrieval tasks and has achieved good accuracy. However, it does not focus on the compression target primarily, and its output is a fixed-length quantization index, which is not suitable for subsequent compression. In this paper, we propose an entropy-based deep product quantization algorithm for deep feature compression. Firstly, it introduces entropy into hard and soft quantization strategies, which can adapt to the codebook optimization and codeword determination operations in the training and testing processes respectively. Secondly, the loss functions related to entropy are designed to adjust the distribution of quantization index, so that it can accommodate to the subsequent entropy coding module. Experimental results carried on retrieval tasks show that the proposed method can be generally combined with deep product quantization and its extended schemes, and can achieve a better compression performance under near lossless condition.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131067395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Complex Event Recognition via Spatial-Temporal Relation Graph Reasoning 基于时空关系图推理的复杂事件识别
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675337
Hua Lin, Hongtian Zhao, Hua Yang
{"title":"Complex Event Recognition via Spatial-Temporal Relation Graph Reasoning","authors":"Hua Lin, Hongtian Zhao, Hua Yang","doi":"10.1109/VCIP53242.2021.9675337","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675337","url":null,"abstract":"Events in videos usually contain a variety of factors: objects, environments, actions, and their interaction relations, and these factors as the mid-level semantics can bridge the gap between the event categories and the video clips. In this paper, we present a novel video events recognition method that uses the graph convolution networks to represent and reason the logic relations among the inner factors. Considering that different kinds of events may focus on different factors, we especially use the transformer networks to extract the spatial-temporal features drawing upon the attention mechanism that can adaptively assign weights to concerned key factors. Although transformers generally rely more on large datasets, we show the effectiveness of applying a 2D convolution backbone before the transformers. We train and test our framework on the challenging video event recognition dataset UCF-Crime and conduct ablation studies. The experimental results show that our method achieves state-of-the-art performance, outperforming previous principal advanced models with a significant margin of recognition accuracy.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133499813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time embedded hologram calculation for augmented reality glasses 增强现实眼镜的实时嵌入式全息图计算
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675435
Antonin Gilles
{"title":"Real-time embedded hologram calculation for augmented reality glasses","authors":"Antonin Gilles","doi":"10.1109/VCIP53242.2021.9675435","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675435","url":null,"abstract":"Thanks to its ability to provide accurate focus cues, Holography is considered as a promising display technology for augmented reality glasses. However, since it contains a large amount of data, the calculation of a hologram is a time-consuming process which results in prohibiting head-motion-to-photon latency, especially when using embedded calculation hardware. In this paper, we present a real-time hologram calculation method implemented on a NVIDIA Jetson AGX Xavier embedded platform. Our method is based on two modules: an offline pre-computation module and an on-the-fly hologram synthesis module. In the offline calculation module, the omnidirectional light field scattered by each scene object is individually pre-computed and stored in a Look-Up Table (LUT). Then, in the hologram synthesis module, the light waves corresponding to the viewer's position and orientation are extracted from the LUT in real-time to compute the hologram. Experimental results show that the proposed method is able to compute 2K1K color holograms at more than 50 frames per second, enabling its use in augmented reality applications.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131927286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater Image Enhancement with Multi-Scale Residual Attention Network 基于多尺度残差注意网络的水下图像增强
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675342
Yosuke Ueki, M. Ikehara
{"title":"Underwater Image Enhancement with Multi-Scale Residual Attention Network","authors":"Yosuke Ueki, M. Ikehara","doi":"10.1109/VCIP53242.2021.9675342","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675342","url":null,"abstract":"Underwater images suffer from low contrast, color distortion and visibility degradation due to the light scattering and attenuation. Over the past few years, the importance of underwater image enhancement has increased because of ocean engineering and underwater robotics. Existing underwater image enhancement methods are based on various assumptions. However, it is almost impossible to define appropriate assumptions for underwater images due to the diversity of underwater images. Therefore, they are only effective for specific types of underwater images. Recently, underwater image enhancement algorisms using CNNs and GANS have been proposed, but they are not as advanced as other image processing methods due to the lack of suitable training data sets and the complexity of the issues. To solve the problems, we propose a novel underwater image enhancement method which combines the residual feature attention block and novel combination of multi-scale and multi-patch structure. Multi-patch network extracts local features to adjust to various underwater images which are often Non-homogeneous. In addition, our network includes multi-scale network which is often effective for image restoration. Experimental results show that our proposed method outperforms the conventional method for various types of images.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134552658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced Cross Component Sample Adaptive Offset for AVS3 增强的跨组件样本自适应偏移AVS3
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675321
Yunrui Jian, Jiaqi Zhang, Junru Li, Suhong Wang, Shanshe Wang, Siwei Ma, Wen Gao
{"title":"Enhanced Cross Component Sample Adaptive Offset for AVS3","authors":"Yunrui Jian, Jiaqi Zhang, Junru Li, Suhong Wang, Shanshe Wang, Siwei Ma, Wen Gao","doi":"10.1109/VCIP53242.2021.9675321","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675321","url":null,"abstract":"Cross-component prediction has great potential for removing the redundancy of multi-components. Recently, cross-component sample adaptive offset (CCSAO) was adopted in the third generation of Audio Video coding Standard (AVS3), which utilizes the intensities of co-located luma samples to determine the offsets of chroma sample filters. However, the frame-level based offset is rough for various content, and the edge information of classified samples is ignored. In this paper, we propose an enhanced CCSAO (ECCSAO) method to further improve the coding performance. Firstly, four selectable 1-D directional patterns are added to make the mapping between luma and chroma components more effectively. Secondly, one four-layer quad-tree based structure is designed to improve the filtering flexibility of CCSAO. Experimental results show that the proposed approach achieves 1.51%, 2.33% and 2.68% BD-rate savings for All-Intra (AI), Random-Access (RA) and Low Delay B (LD) configurations compared to AVS3 reference software, respectively. A subset improvement of ECCSAO has been adopted by AVS3.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126326869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Telemoji: A video chat with automated recognition of facial expressions Telemoji:自动识别面部表情的视频聊天
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675330
Alex Kreinis, Tom Damri, Tomer Leon, Marina Litvak, Irina Rabaev
{"title":"Telemoji: A video chat with automated recognition of facial expressions","authors":"Alex Kreinis, Tom Damri, Tomer Leon, Marina Litvak, Irina Rabaev","doi":"10.1109/VCIP53242.2021.9675330","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675330","url":null,"abstract":"Autism spectrum disorder (ASD) is frequently ac-companied by impairment in emotional expression recognition, and therefore individuals with ASD may find it hard to interpret emotions and interact. Inspired by this fact, we developed a web-based video chat to assist people with ASD, both for real-time recognition of facial emotions and for practicing. This real-time application detects the speaker's face in a video stream and classifies the expressed emotion into one of the seven categories: neutral, surprise, happy, angry, disgust, fear, and sad. The classification is then displayed as the text label below the speaker's face. We developed this application as a part of the undergraduate project for the B.Sc. degree in Software Engineering. Its development and testing were made with the cooperation of the local society for children and adults with autism. The application has been released for unrestricted use on https://telemojii.herokuapp.com/. The demo is available at http://www.filedropper.com/telemojishortdemoblur.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"45 Suppl 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126390728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pixel Gradient Based Zooming Method for Plenoptic Intra Prediction 基于像素梯度的全视场内预测缩放方法
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675380
Fan Jiang, Xin Jin, Kedeng Tong
{"title":"Pixel Gradient Based Zooming Method for Plenoptic Intra Prediction","authors":"Fan Jiang, Xin Jin, Kedeng Tong","doi":"10.1109/VCIP53242.2021.9675380","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675380","url":null,"abstract":"Plenoptic 2.0 videos that record time-varying light fields by focused plenoptic cameras are prospective to immersive visual applications due to capturing dense sampled light fields with high spatial resolution in the rendered sub-apertures. In this paper, an intra prediction method is proposed for compressing multi-focus plenoptic 2.0 videos efficiently. Based on the estimation of zooming factor, novel gradient-feature-based zooming, adaptive-bilinear-interpolation-based tailoring and inverse-gradient-based boundary filtering are proposed and executed sequentially to generate accurate prediction candidates for weighted prediction working with adaptive skipping strategy. Experimental results demonstrate the superior performance of the proposed method relative to HEVC and state-of-the-art methods.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2022 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127601765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reinforcement Learning based ROI Bit Allocation for Gaming Video Coding in VVC 基于强化学习的VVC游戏视频编码ROI位分配
2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675345
Guangjie Ren, Zizheng Liu, Zhenzhong Chen, Shan Liu
{"title":"Reinforcement Learning based ROI Bit Allocation for Gaming Video Coding in VVC","authors":"Guangjie Ren, Zizheng Liu, Zhenzhong Chen, Shan Liu","doi":"10.1109/VCIP53242.2021.9675345","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675345","url":null,"abstract":"In this paper, we propose a reinforcement learning based region of interest (ROI) bit allocation method for gaming video coding in Versatile Video Coding (VVC). Most current ROI-based bit allocation methods rely on bit budgets based on frame-level empirical weight allocation. The restricted bit budgets influence the efficiency of ROI-based bit allocation and the stability of video quality. To address this issue, the bit allocation process of frame and ROI are combined and formulated as a Markov decision process (MDP). A deep reinforcement learning (RL) method is adopted to solve this problem and obtain the appropriate bits of frame and ROI. Our target is to improve the quality of ROI and reduce the frame-level quality fluctuation, whilst satisfying the bit budgets constraint. The RL-based ROI bit allocation method is implemented in the latest video coding standard and verified for gaming video coding. The experimental results demonstrate that the proposed method achieves a better quality of ROI while reducing the quality fluctuation compared to the reference methods.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"15 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121005484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信