{"title":"Visual Comfort Classification for Stereoscopic Videos Based on Two-Stream Recurrent Neural Network with Multi-level Attention","authors":"Weize Gan, Danhong Peng, Yuzhen Niu","doi":"10.1145/3561613.3561628","DOIUrl":"https://doi.org/10.1145/3561613.3561628","url":null,"abstract":"Due to the differences in visual systems between children and adults, a professional stereoscopic 3D video may not be comfortable for children. In this paper, we aim to answer whether a stereoscopic video is comfortable for children to watch by solving the visual comfort classification for stereoscopic videos. In particular, we propose a two-stream recurrent neural network (RNN) with multi-level attention for the visual comfort classification for stereoscopic videos. Firstly, we propose a two-stream RNN to extract and fuse spatial and temporal features from video frames and disparity maps. Furthermore, we propose using multi-level attention to effectively enhance the features in frame level, shot level, and finally video level. In addition, to our best knowledge, we establish the first high-definition stereoscopic 3D video dataset for performance evaluation. Experimental results show that our proposed model can effectively classify professional stereoscopic videos into visually comfortable for children or adults only.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"354 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132583095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qikun Pan, Xiaoxi Xu, Qi Chang, Chundi Pan, Guo Cao
{"title":"Feature Fusion: Graph Attention Network and CNN Combing for Hyperspectral Image Classification","authors":"Qikun Pan, Xiaoxi Xu, Qi Chang, Chundi Pan, Guo Cao","doi":"10.1145/3561613.3561640","DOIUrl":"https://doi.org/10.1145/3561613.3561640","url":null,"abstract":"Graph convolutional networks (GCNs) have attracted increasing attention in hyperspectral image classification. However, most of the available GCN-based HSI classification methods treat superpixels as graph nodes, ignoring pixel-level spectral spatial features. In this paper, we propose a novel Feature Fusion Network (FFGCN), which is composed of two different convolutional networks, namely Graph Attention Network (GAT) and Convolutional Neural Network (CNN). Among them, superpixel-based GAT can deal with the problem of labeled deficiency and extract spatial features from HSI. Attention-based multi-scale CNN can extract multi-scale pixel local features for HSI classification. Finally, the features of the two neural network models are fused and used for classification. Rigorous experiments on two real HSI datasets show that FFGCN achieves better experimental results and is competitive with other state-of-the-art methods.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"24 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114030382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-Time Robust Single Sperm Tracking via Adaptive Particle Filtering","authors":"Fengling Meng, Yinran Chen, Xióngbiao Luó","doi":"10.1145/3561613.3561638","DOIUrl":"https://doi.org/10.1145/3561613.3561638","url":null,"abstract":"Assisted reproductive technology is commonly used to treat infertility. Motility-based selection of high-quality sperms is the key to improve the successful rate of artificial assisted reproduction. Visually tracking the sperms on optical microscopic video frames is essential to evaluate their motility before the selection. Unfortunately, current methods easily fail to precisely track the sperms in real time. This work is to accurately and robustly detect and track single sperm based on microscopic video frames. We propose a modified background subtraction method to detect multiple sperms in successive frames. We also introduce an adaptive particle filtering method to accurately and robustly track the trajectory of a single sperm in real time. Specifically, this method models the sperm movement by comparing its histogram information at different positions on microscopic images and uses adaptive particle filtering to approximate the optimal state of the sperm. The experimental results demonstrate that our method can achieve much better tracking accuracy than other visual tracking methods, providing more reliable sperm motility analysis. In particular, our method can successfully re-track the same sperm when it appears again on the microscopic focal plane after disappearing in a few frames, while the other compared tracking methods usually fail to re-track the same sperm after its back.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121291802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"“Presence” and “Empathy” — Design and Implementation Emotional Interactive Storytelling for Virtual Character","authors":"Manyu Zhang","doi":"10.1145/3561613.3561632","DOIUrl":"https://doi.org/10.1145/3561613.3561632","url":null,"abstract":"One of the key motivators for participating in Virtual Reality (VR) is the opportunity to and appeal of becoming immersed in a virtual environment. One avenue that is anticipated to have significant expansion is storytelling through VR, as it offers novel and absorbing experiences. To develop a design interactive storytelling program using VR-based coding, examples of VR application and coding storytelling were analyzed. Base on this analysis, we developed one design interactive storytelling featuring a virtual environment that supports the facilitation of such experiences. In this paper, we introduce expands the interactive storytelling structure, both in general and for VR. The current interactive storytelling systems are extended via emotional modeling and tracking. The components being proposed are to supplement the story segments with information about the response anticipated from users, a modeled emotional path for the individual emotional categories linked to the story, and an internal system to track emotions, in a bid to predict the users’ present emotional condition. We also show the results of the implementation with the 43 students (age 18-28) that demonstrate the emotional expression for the use of interactive storytelling. The results showed that virtual interactive storytelling, the usability of the system and the impact of plot development on inference and story understanding.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130212809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid-Spatial Transformer for Image Captioning","authors":"Jincheng Zheng, Chi-Man Pun","doi":"10.1145/3561613.3561617","DOIUrl":"https://doi.org/10.1145/3561613.3561617","url":null,"abstract":"Recent years, the transformer-based model has achieved great success in many tasks such as machine translation. This encoder-decoder architecture is proved to be useful for image captioning tasks as well. We propose a novel Hybrid-Spatial Transformer model for image captioning. In this work, we combine the Global information and Local information of image as input of encoder which extracted by VGG16 and Faster R-CNN respectively. To further improve the performance of model, we add spatial information to attention layer by incorporating geometry features to attention weight. What’s more, queries Q, keys K, values V are a bit different from standard transformer, which is reflected in theses aspects. The positional encoding or embedding is not added to values V both encoder and decoder, the positional embedding is added to keys K on cross-attention. The experimental results illustrate that our model can achieve state-of-the art performance on CIDEr-D, METEROR and BLEU-1 on MS-COCO dataset.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122456074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frequency Domain Spline Prioritization Optimization Adaptive Filters","authors":"Wenyan Guo, Yongfeng Zhi, Zhe Zhang, Honggang Gao","doi":"10.1145/3561613.3561645","DOIUrl":"https://doi.org/10.1145/3561613.3561645","url":null,"abstract":"The spline prioritization optimization adaptive filter (SPOAF) is a nonlinear filtering algorithm with a relatively simple architecture. It is composed of the FIR filter cascaded a nonlinear interpolation module. When the length of the FIR filter is long, the computational complexity will increase exponentially. To solve this problem, this paper proposes a frequency domain spline prioritization optimization adaptive filter (FDSPOAF). More specifically, the FIR filter is implemented in the frequency domain, using the fast Fourier transform and its inverse transform, which converts convolution in the time domain into multiplication in the frequency domain. This paper describes the detailed steps of the FDSPOAF method and analyzes the computational complexity. Finally, it is verified by numerical experiments that the algorithm can reduce the operation time. Compared with the traditional SPOAF algorithm, the proposed FDSPOAF algorithm can effectively reduce the operation time of the algorithm with comparable convergence performance.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132440114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Masked Face Recognition Using MobileNetV2","authors":"Ming Liu, Wei Yan","doi":"10.1145/3561613.3561650","DOIUrl":"https://doi.org/10.1145/3561613.3561650","url":null,"abstract":"Masked face recognition has made great progress in the field of computer vision since the popularity of COVID-19 epidemic in 2020. In countries with severe outbreaks, people are required to wear masks in public. The current face recognition methods, which take use of the whole face as input data, are quite well established. However, while people are use of face masks, it will reduce the accuracy of face recognition. Therefore, we propose a mask wearing recognition method based on MobileNetV2 and solve the problem that many of models cannot be applied to portable devices or mobile terminals. The results indicate that this method has 98.30% accuracy in identifying the masked face. Simultaneously, a higher accuracy is obtained compared to VGG16. This approach has proven to be working well for the practical needs.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129325737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoxi Lu, Xingyue Wang, Jiansheng Fang, Na Zeng, Yao Xiang, Jingfeng Zhang, Jianjun Zheng, Jiang Liu
{"title":"Pulmonary Nodule Detection Based on RPN with Squeeze‐and‐Excitation Block","authors":"Xiaoxi Lu, Xingyue Wang, Jiansheng Fang, Na Zeng, Yao Xiang, Jingfeng Zhang, Jianjun Zheng, Jiang Liu","doi":"10.1145/3561613.3561627","DOIUrl":"https://doi.org/10.1145/3561613.3561627","url":null,"abstract":"Early detection of lung cancer is a crucial step to improve the chances of survival. To detect the pulmonary nodules, various methods are proposed including one-stage object detection methods (e.g., YOLO, SSD) and two-stage detection methods(e.g., Faster RCNN). Two-stage methods are more accurate than one-stage, thus more likely used in the detection of a small object. Faster RCNN as a two-stage method, ensuring more efficient and accurate region proposal generation, is consistent with our task’s objective, that is, detecting small 3-D nodules from large CT image volume. Therefore, in our work, we used 3-D region proposal network (RPN) proposed in Faster RCNN to detect nodules. However, different from natural images with clear boundaries and textures, pulmonary nodules have different types and locations, which are hard to recognize. Thus with the thought that if the network can learn more features of the nodules, the performance would be better, we also applied the \"Squeeze-and-Excitation\" blocks to the 3-D RPN, which we term it as SE-Res RPN. The experimental results show that the sensitivity of SE-Res RPN in 10-fold cross-validation of LUNA 16 is 93.7 , which achieves great performance without a false positive reduction stage.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114886270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-stage Citrus Detection based on Improved Yolov4","authors":"Bingliang Yi, Bin Kong, C. Xu","doi":"10.1145/3561613.3561623","DOIUrl":"https://doi.org/10.1145/3561613.3561623","url":null,"abstract":"At present, the research of Citrus recognition is basically aimed at the detection of Citrus in mature stage. This paper proposes a citrus detection algorithm based on improved yolov4, which can detect citrus in each growth stage. Based on yolov4, Introducing CBAM attention mechanism to improve the feature extraction ability of backbone networks; Increase the 22nd layer output of feature extraction network to improve the small target detection rate; A short connection feature fusion method is designed to increase the utilization of shallow feature information; Add a detection head with a scale of 152 * 152 for small-scale targets. It is proved by experiments on the self-built citrus data set, the improved CBAM-F-YOLOv4 can effectively detect citrus in each stage, and the mean Average Precision (mAP) is 6.2 percentage points higher than the original algorithm, reaching 87.3%. The detection results show that the improved algorithm greatly improves the detection ability of occlusion、 overlap and small-scale citrus.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128450691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ski Fall Detection from Digital Images Using Deep Learning","authors":"Yulin Zhu, Wei Yan","doi":"10.1145/3561613.3561625","DOIUrl":"https://doi.org/10.1145/3561613.3561625","url":null,"abstract":"In this paper, we explore how to take advantage of computer vision to assist ski resorts and monitor the safety of skiers on the tracks. In order to quickly detect any falls or injures, and provide first aid for injured people, we make use of archived ski videos, which are employed to explore the possibility of skiers fall detection. Throughout combinations of visual object detection with human pose detection by using deep learning methods. Our ultimate goal of this project is to provide a way for ski safety monitoring which has potential applications for physical training. Our contribution in this paper is to propose a fall detection method suitable for skiers based on visual object detection, we have obtained 0.94 mAP accuracy in preliminary tests.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130604182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}