{"title":"Texture-aware Network for Smoke Density Estimation","authors":"Xue Xia, K. Zhan, Yajing Peng, Yuming Fang","doi":"10.1109/VCIP56404.2022.10008826","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008826","url":null,"abstract":"Smoke density estimation, also termed as soft segmentation, was developed from pixel-wise smoke (hard) segmen-tation and it aims at providing transparency and segmentation confidence for each pixel. The key difference between them lies in that segmentation focuses on classifying pixels into smoke and non-smoke ones, while density estimation obtains inner transparency of smoke component rather than treat all smoke pixels as an equal value. Based on this, we propose a texture-aware network being able to capture inner transparency of smoke components rather than merely focus on general smoke distribution for pixel-wise smoke density estimation. Besides, we adapt the Squeeze-and-Excitation (SE) layer for smoke feature extraction by involving max values for robustness. In order to represent inhomogeneous smoke pixels, we proposed a simple yet efficient attention-based texture-aware module that involves both gradient and semantic information. Experimental results show that our method outperforms others in both single image density estimation or segmentation and video smoke detection.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121663119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Zhou, Shuai Guo, Jin-Song Hu, Jiong-Qi Wang, Qiuwen Wang, Li Song
{"title":"RGBD-based Real-time Volumetric Reconstruction System: Architecture Design and Implementation","authors":"Kai Zhou, Shuai Guo, Jin-Song Hu, Jiong-Qi Wang, Qiuwen Wang, Li Song","doi":"10.1109/VCIP56404.2022.10008839","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008839","url":null,"abstract":"With the increasing popularity of commercial depth cameras, 3D reconstruction of dynamic scenes has aroused widespread interest. Although many novel 3D applications have been unlocked, real-time performance is still a big problem. In this paper, a low-cost, real-time system: LiveRecon3D, is presented, with multiple RGB-D cameras connected to one single computer. The goal of the system is to provide an interactive frame rate for 3D content capture and rendering at a reduced cost. In the proposed system, we adopt a scalable volume structure and employ ray casting technique to extract the surface of 3D content. Based on a pipeline design, all the modules in the system run in parallel and are designed to minimize the latency to achieve an interactive frame rate of 30 FPS. At last, experimental results corresponding to implementation with three Kinect v2 cameras are presented to verify the system's effectiveness in terms of visual quality and real-time performance.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125343245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"History-parameter-based Affine Model Inheritance","authors":"Kai Zhang, Li Zhang, Z. Deng, Na Zhang, Yang Wang","doi":"10.1109/VCIP56404.2022.10008881","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008881","url":null,"abstract":"In VVC, affine motion compensation (AMC) is a powerful coding tool to address non-translational motion, while history-based motion vector prediction (HMVP) is an efficient approach to compress motion vectors. However, HMVP was designed for translational motion vectors, without considering control point motion vectors (CPMV) for AMC. This paper presents a method of history-parameter-based affine model inheritance (HAMI), to utilize history information to represent CPMV more efficiently. With HAMI, affine parameters of previously affine-coded block are stored in a first history-parameter table (HPT). New affine-merge, affine motion vector prediction candidates and regular-merge candidates can be constructed with affine parameters fetched from the first HPT and base MVs fetched from neighbouring blocks in a “base-parameter-decoupled” way. New affine merge candidates can also be generated in a “base-parameter-coupled” way from a second HPT, which stores base MV information together with corresponding affine parameters. Besides, pair-wised affine merge candidates are generated by two existing affine merge candidates. Experimental results show that HAMI provides an average BD-rate saving about 0.34 % with a negligible change on the running time, compared with ECM-3.1 in random access configurations. HAMI has been adopted into ECM.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116696660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junhan Chen, Dongliang Chang, Jiyang Xie, Ruoyi Du, Zhanyu Ma
{"title":"Cross-Layer Feature based Multi-Granularity Visual Classification","authors":"Junhan Chen, Dongliang Chang, Jiyang Xie, Ruoyi Du, Zhanyu Ma","doi":"10.1109/VCIP56404.2022.10008879","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008879","url":null,"abstract":"In contrast to traditional fine-grained visual clas-sification, multi-granularity visual classification is no longer limited to identifying the different sub-classes belonging to the same super-class (e.g., bird species, cars, and aircraft models). Instead, it gives a sequence of labels from coarse to fine (e.g., Passeriformes → Corvidae → Fish Crow), which is more convenient in practice. The key to solving this task is how to use the relationships between the different levels of labels to learn feature representations that contain different levels of granularity. Interestingly, the feature pyramid structure naturally implies different granularity of feature representation, with the shallow layers representing coarse-grained features and the deep layers representing fine-grained features. Therefore, in this paper, we exploit this property of the feature pyramid structure to decouple features and obtain feature representations corre-sponding to different granularities. Specifically, we use shallow features for coarse-grained classification and deep features for fine-grained classification. In addition, to enable fine-grained features to enhance the coarse-grained classification, we propose a feature reinforcement module based on the feature pyramid structure, where deep features are first upsampled and then combined with shallow features to make decisions. Experimental results on three widely used fine-grained image classification datasets such as CUB-200-2011, Stanford Cars, and FGVC-Aircraft validate the method's effectiveness. Code available at https://github.com/PRIS-CV/CGVC.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116813530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Compensation Based Dual-Stream Feature Interaction Network for Multi-oriented Scene Text Detection","authors":"Siyan Wang, Sumei Li","doi":"10.1109/VCIP56404.2022.10008831","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008831","url":null,"abstract":"Due to the various appearances of scene text instances and the disturbance of background, it is still a challenging task to design an effective and accurate text detector. To tackle this problem, in this paper we propose a novel dual-stream scene text detector considering semantic compensation and feature interaction. The detector extracts image features from two input images of different resolution, which improves its perceptive ability and contributes to detecting large and long texts. Specifically, we propose a Semantic Compensation Module (SCM) to aggregate features between the two streams, which compensates semantic information in features at each level via an attention mechanism. Moreover, we design a Feature Interaction Module (FIM) to obtain more expressive features. Experiments conducted on three benchmark datasets, ICDAR2015, MSRA-TD500 and ICDAR2017-MLT, demonstrate that our proposed method has competitive performance and strong robustness.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116118307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Autoencoder-based intra prediction with auxiliary feature","authors":"Luhang Xu, Yue Yu, Haoping Yu, Dong Wang","doi":"10.1109/VCIP56404.2022.10008846","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008846","url":null,"abstract":"A set of auto encoders is trained to perform intra prediction for block-based video coding. Each auto encoder consists of an encoding network and a decoding network. Both encoding network and decoding networks are jointly optimized and integrated into the state-of-the-art VVC reference software VTM-11.0 as an additional intra prediction mode. The simulation is conducted under common test conditions with all intra config-urations and the test results show 1.55%, 1.04%, and 0.99% of Y, U, V components Bjentegaard-Delta bit rate saving compared to VTM-11.0 anchor, respectively. The overall relative decoding running time of proposed autoencoder-based intra prediction mode on top of VTM-11.0 are 408% compared to VTM-11.0.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116722095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaodi Shi, Jucai Lin, Dong-Jin Jiang, Chunmei Nian, Jun Yin
{"title":"Recurrent Network with Enhanced Alignment and Attention-Guided Aggregation for Compressed Video Quality Enhancement","authors":"Xiaodi Shi, Jucai Lin, Dong-Jin Jiang, Chunmei Nian, Jun Yin","doi":"10.1109/VCIP56404.2022.10008807","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008807","url":null,"abstract":"Recently, various compressed video quality enhancement technologies have been proposed to overcome the visual artifacts. Most existing methods are based on optical flow or deformable alignment to explore the spatiotemporal information across frames. However, inaccurate motion estimation and training instability of deformable convolution would be detrimental to the reconstruction performance. In this paper, we design a bi-directional recurrent network equipping with enhanced deformable alignment and attention-guided aggregation to promote information flows among frames. For the alignment, a pair of scale and shift parameters are learned to modulate optical flows into new offsets for deformable convolution. Furthermore, an attention aggregation strategy oriented at preference is designed for temporal information fusion. The strategy synthesizes global information of inputs to modulate features for effective fusion. Extensive experiments have proved that the proposed method achieves great performance in terms of quantitative performance and qualitative effect.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117134591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuetong Liu, Rui Zhang, Yunfeng Zhang, Yang Ning, Xunxiang Yao, Huijian Han
{"title":"Recurrent Multi-connection Fusion Network for Single Image Deraining","authors":"Yuetong Liu, Rui Zhang, Yunfeng Zhang, Yang Ning, Xunxiang Yao, Huijian Han","doi":"10.1109/VCIP56404.2022.10008893","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008893","url":null,"abstract":"Single image deraining is an important problem in many computer vision tasks because rain streaks can severely degrade the image quality. Recently, deep convolution neural network (CNN) based single image deraining methods have been developed with encouraging performance. However, most of these algorithms are designed by stacking convolutional layers, which encounter obstacles in learning abstract feature representation effectively and can only obtain limited features in the local region. In this paper, we propose a recurrent multi-connection fusion network (RMCFN) to remove rain streaks from single images. Specifically, the RMCFN employs two key components and multiple connections to fully utilize and transfer features. Firstly, we use a multi-scale fusion memory block (MFMB) to exploit multi-scale features and obtain long-range dependencies, which is beneficial to feed useful information to a later stage. Moreover, to efficiently capture the informative features on the transmission, we fuse the features of different levels and employ a multi-connection manner to use the information within and between stages. Finally, we develop a dual attention enhancement block (DAEB) to explore the valuable channel and spatial components and only pass further useful features. Extensive experiments verify the superiority of our method in visual effect and quantitative results compared to the state-of-the-arts.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127099942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuang Deng, Yuhang Zhang, Wenrui Dai, Xiaopeng Zhang, H. Xiong
{"title":"Weakly Supervised Region-Level Contrastive Learning for Efficient Object Detection","authors":"Yuang Deng, Yuhang Zhang, Wenrui Dai, Xiaopeng Zhang, H. Xiong","doi":"10.1109/VCIP56404.2022.10008827","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008827","url":null,"abstract":"Semi-supervised learning, which assigns pseudo labels with models trained using limited labeled data, has been widely used in object detection to reduce the labeling cost. However, the provided pseudo annotations inevitably suffer noise since the initial model is not perfect. To address this issue, this paper introduces contrastive learning into semi-supervised object detection, and we claim that contrastive loss, which inherently relies on data augmentations, is much more robust than traditional softmax regression for noisy labels. To take full advantage of it in the detection task, we incorporate labels prior to contrastive loss and leverage plenty of region proposals to enhance diversity, which is crucial for contrastive learning. In this way, the model is optimized to make the region-level features with the same class be translation and scale invariant. Furthermore, we redesign the negative memory bank in contrastive learning to make the training more efficient. As far as we know, we are the first attempt that introduces contrastive learning in semi-supervised object detection. Experimental results on detection benchmarks demonstrate the superiority of our method. Notably, our method achieves 79.9% accuracy on VOC, which is 6.2% better than the supervised baseline and 0.7% improvement compared with the state-of-the-art method.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127105400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reduced Reference Quality Assessment for Point Cloud Compression","authors":"Yipeng Liu, Qi Yang, Yi Xu","doi":"10.1109/VCIP56404.2022.10008813","DOIUrl":"https://doi.org/10.1109/VCIP56404.2022.10008813","url":null,"abstract":"In this paper, we propose a reduced reference (RR) point cloud quality assessment (PCQA) model named R-PCQA to quantify the distortions introduced by the lossy compression. Specifically, we use the attribute and geometry quantization steps of different compression methods (i.e., V-PCC, G-PCC and AVS) to infer the point cloud quality, assuming that the point clouds have no other distortions before compression. First, we analyze the compression distortion of point clouds under separate attribute compression and geometry compression to avoid their mutual masking, for which we consider 5 point clouds as references to generate a compression dataset (PCCQA) containing independent attribute compression and geometry compression samples. Then, we develop the proposed R-PCQA via fitting the relationship between the quantization steps and the perceptual quality. We evaluate the performance of R-PCQA on both the established dataset and another independent dataset. The results demonstrate that the proposed R-PCQA can exhibit reliable performance and high generalization ability.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127144779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}