{"title":"Data Transformer for Anomalous Trajectory Detection","authors":"Hsuan-Jen Psan, Wen-Jiin Tsai","doi":"10.1109/VCIP53242.2021.9675322","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675322","url":null,"abstract":"Anomaly detection is an important task in many traffic applications. Methods based on deep learning networks reach high accuracy; however, they typically rely on supervised training with large annotated data. Considering that anomalous data are not easy to obtain, we present data transformation methods which convert the data obtained from one intersection to other intersections to mitigate the effort of collecting training data. The proposed methods are demonstrated on the task of anomalous trajectory detection. A General model and a Universal model are proposed. The former focuses on saving data collection effort; the latter further reduces the network training effort. We evaluated the methods on the dataset with trajectories from four intersections in GTA V virtual world. The experimental results show that with significant reduction in data collecting and network training efforts, the proposed anomalous trajectory detection still achieves state-of-the-art accuracy.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"233 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122379890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning to Fly with a Video Generator","authors":"Chia-Chun Chung, Wen-Hsiao Peng, Teng-Hu Cheng, Chin-Feng Yu","doi":"10.1109/VCIP53242.2021.9675414","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675414","url":null,"abstract":"This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by the drone by predicting the next video frame as a new state signal. The Dreamer is a conditional video sequence generator. This model-based environment avoids the time-consuming interactions between the agent and the environment, speeding up largely the training process. This demonstration showcases for the first time the application of the Dreamer to train an agent that can finish the racing task in the Airsim simulator.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122381129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Kathariya, Zhu Li, Jianle Chen, G. V. D. Auwera
{"title":"Gradient Compression with a Variational Coding Scheme for Federated Learning","authors":"B. Kathariya, Zhu Li, Jianle Chen, G. V. D. Auwera","doi":"10.1109/VCIP53242.2021.9675436","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675436","url":null,"abstract":"Federated Learning (FL), a distributed machine learning architecture, emerged to solve the intelligent data analysis on massive data generated at network edge-devices. With this paradigm, a model is jointly learned in parallel at edge-devices without needing to send voluminous data to a central FL server. This not only allows a model to learn in a feasible duration by reducing network latency but also preserves data privacy. Nonetheless, when thousands of edge-devices are attached to an FL framework, limited network resources inevitably impose intolerable training latency. In this work, we propose model-update compression to solve this issue in a very novel way. The proposed method learns multiple Gaussian distributions that best describe the high dimensional gradient parameters. In the FL server, high dimensional gradients are repopulated from Gaussian distributions utilizing likelihood function parameters which are communicated to the server. Since the distribution information parameters constitute a very small percentage of values compared to the high dimensional gradients themselves, our proposed method is able to save significant uplink band-width while preserving the model accuracy. Experimental results validated our claim.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122423129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learn A Compression for Objection Detection - VAE with a Bridge","authors":"Yixin Mei, Fan Li, Li Li, Zhu Li","doi":"10.1109/VCIP53242.2021.9675387","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675387","url":null,"abstract":"Recent advances in sensor technology and wide deployment of visual sensors lead to a new application whereas compression of images are not mainly for pixel recovery for human consumption, instead it is for communication to cloud side machine vision tasks like classification, identification, detection and tracking. This opens up new research dimensions for a learning based compression that directly optimizes loss function in vision tasks, and therefore achieves better compression performance vis-a-vis the pixel recovery and then performing vision tasks computing. In this work, we developed a learning based compression scheme that learns a compact feature representation and appropriate bitstreams for the task of visual object detection. Variational Auto-Encoder (VAE) framework is adopted for learning a compact representation, while a bridge network is trained to drive the detection loss function. Simulation results demonstrate that this approach is achieving a new state-of-the-art in task driven compression efficiency, compared with pixel recovery approaches, including both learning based and handcrafted solutions.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124856899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linear Regression Mode of Intra Prediction for Screen Content Coding","authors":"Wei Peng, Hongkui Wang, Li Yu","doi":"10.1109/VCIP53242.2021.9675382","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675382","url":null,"abstract":"High Efficiency Video Coding - Screen Content Coding (HEVC-SCC) follows the traditional angular intra prediction technique in HEVC. However, the Planar mode and the DC mode are somewhat repetitive for screen content video with features such as no senor noise. Hence, this paper proposes a new intra prediction mode called linear regression (LR) mode, which combines the Planar mode and the DC mode into one mode. The LR mode improves the prediction accuracy of intra prediction for fading regions in screen content video. Besides, by optimizing the most probable mode (MPM) construction, the hit rate of the best mode in the MPM list is improved. The experimental results show that the proposed method can achieve 0.57% BD-BR reduction compared with HM $16.20+text{SCM} 8.8$, while the coding time remains largely the same.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126094622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the General and Technical Program Chairs","authors":"","doi":"10.1109/vcip53242.2021.9675415","DOIUrl":"https://doi.org/10.1109/vcip53242.2021.9675415","url":null,"abstract":"","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125450390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learn to Look Around: Deep Reinforcement Learning Agent for Video Saliency Prediction","authors":"Yiran Tao, Yaosi Hu, Zhenzhong Chen","doi":"10.1109/VCIP53242.2021.9675397","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675397","url":null,"abstract":"In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is proposed, designed to look around adjacent frames and adaptively generate a salient contextual window that contains the most correlated information of keyframe for saliency prediction. More specifically, an action set step by step decides whether to expand the window, meanwhile a state set and reward function evaluate the effectiveness of the current window. The deep Q-learning algorithm is followed to train the agent to learn a policy to achieve its goal. The proposed agent can be regarded as plug-and-play which is compatible with generic video saliency prediction models. Experimental results on various datasets demonstrate that our method can achieve an advanced performance.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116920216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DWS-BEAM: Decoder-Wise Subpicture Bitstream Extracting and Merging for MPEG Immersive Video","authors":"Jong-Beom Jeong, Soonbin Lee, Eun‐Seok Ryu","doi":"10.1109/VCIP53242.2021.9675419","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675419","url":null,"abstract":"With the new immersive video coding standard MPEG immersive video (MIV) and versatile video coding (VVC), six degrees of freedom (6DoF) virtual reality (VR) streaming technology is emerging for both computer-generated and natural content videos. This paper addresses the decoder-wise subpicture bitstream extracting and merging (DWS-BEAM) method for MIV and proposes two main ideas: (i) a selective streaming-aware subpicture allocation method using a motion-constrained tile set (MCTS), (ii) a decoder-wise subpicture extracting and merging method for single-pass decoding. In the experiments using the VVC test model (VTM), the proposed method shows 1.23% BD-rate saving for immersive video PSNR (IV-PSNR) and 15.78% decoding runtime saving compared to the VTM anchor. Moreover, while the MIV test model requires four decoders, the proposed method only requires one decoder.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132486723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SMRD: A Local Feature Descriptor for Multi-modal Image Registration","authors":"Jiayu Xie, Xin Jin, Hongkun Cao","doi":"10.1109/VCIP53242.2021.9675401","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675401","url":null,"abstract":"Image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the non-linear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131391529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human Action Recognition on Raw Depth Maps","authors":"Jacek Trelinski, B. Kwolek","doi":"10.1109/VCIP53242.2021.9675349","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675349","url":null,"abstract":"We propose an effective framework for human action recognition on raw depth maps. We leverage a convolutional autoencoder to extract on sequences of deep maps the frame-features that are then fed to a 1D-CNN responsible for embedding action features. A Siamese neural network trained on repre-sentative single depth map for each sequence extracts features, which are then processed by shapelets algorithm to extract action features. These features are then concatenated with features extracted by a BiLSTM with TimeDistributed wrapper. Given the learned individual models on such features we perform a selection of a subset of models. We demonstrate experimentally that on SYSU 3DHOI dataset the proposed algorithm outperforms considerably all recent algorithms including skeleton-based ones.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131748864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}