Zhen Cheng, Jianshe Xiong, PengCheng Yang, Kai Yang, Yunnuo Chen
{"title":"Object Detection in Optical Remote Sensing Images Based on Improved Lightweight Neural Network","authors":"Zhen Cheng, Jianshe Xiong, PengCheng Yang, Kai Yang, Yunnuo Chen","doi":"10.1109/ICIVC55077.2022.9886739","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886739","url":null,"abstract":"The optical remote sensing images collected by Unmanned Aerial Vehicle Remote Sensing (UAVRS) with real-time information, and object detection of the optical remote sensing images has significant development potential in the many fields such as transportation and agriculture. In addition to large objects such as buildings, small objects such as vehicles and ships can also be clearly observed in the collected high-resolution remote sensing images. This paper mainly focuses on the detection of vehicles and ships in remote sensing images, and proposes Scene-SSD based on the main principles of MobileNetV3 and SSD. In this paper, we improve the basic block bottleneck of MobileNetV3, introduce Generalized Focal Loss (GFL) function to replace the original loss function in SSD, improve the class imbalance problem and make the bounding box estimations are more precise, and the network model is trained by transfer learning to improve its generalization ability. It is experimentally illustrated that in object detection of remote sensing images, the Scene-SSD proposed in this paper is fast and the tested mAP can reach 77.9%, which is better than the MobileNetV3-SSDLite with the same network structure in the comparison test.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134276795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Method of Image Recognition with Deep Learning Combined with Attention Mechanism","authors":"Fang Xiaoyu, Wang Linlin, Liu Chang, Hong Tao","doi":"10.1109/ICIVC55077.2022.9887045","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9887045","url":null,"abstract":"An improved convolutional neural network (CNN) recognition model is proposed for the problems involving low recognition rate and weak generalization ability for flower images. Highly abstracted features after multiple convolutions are integrated, and the performance of network is improved by adding the network model for multi-attention mechanism after residual module for Inception-resnet-V2 Network and fully connected layer before activating the function. The improved model is simulated by integrating OxFlowers 17 and Oxford 102 flower data sets. The results show that the recognition rate of the model based on Inception-resnet-V2 Network combined with attention mechanism is up to 97.6%, being 5.1% higher than that of the original model, and the accuracy for flowers recognition is improved significantly.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129511985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Task-Driven Dual-Light Image Fusion and Enhancement Method under Low Illumination","authors":"Bokun Liu, Junyu Wei, Shaojing Su, Xiaozhong Tong","doi":"10.1109/ICIVC55077.2022.9886778","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886778","url":null,"abstract":"In low light situations, a single visible image can not transmit reliable information, even cause the loss of the target information. At this point, the advantages of visible and infrared image fusion will be highlighted. For a given pair of visible and infrared images, they are collectively referred to as dual-light images in this paper. How to make the most of their information and improve the information expression ability of the fused image is crucial. The traditional evaluation methods use statistical indicators, which is not associated with the upstream task. In this paper, the image fusion method driven by the target detection task is studied. Semantic loss is added to guide the dual-light image fusion. Moreover, through the visual enhancement module, the impact of adverse factors ( low light, etc. ) on the image is weakened, and the information expression level of the image is improved. Thus, the final image is more beneficial to target detection.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"508 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133088775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mobile Robot Path Planning Based on the Focused Heuristic Algorithm","authors":"Jia-Ming Lyu, Tian Ma, Wu Zhang, Yukun Yang","doi":"10.1109/ICIVC55077.2022.9886971","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886971","url":null,"abstract":"Aiming at the problems of low search efficiency, high search cost, and redundant search range in the traditional D* Lite algorithm in solving the path planning problem, the Focused D* Lite (FDL) algorithm is proposed. The proposed algorithm optimizes and adjusts the node and line respectively. Firstly, based on the current coordinates of mobile robots, the feasibility judgment and information transmission of obstacle information in eight neighborhoods are carried out to enhance the search capability of each step and ensure the effectiveness of the subsequent search. Secondly, the weight assignment is provided for the planned path to improve the concentration of the planned path, so that the algorithm can focus on the key and leading path, reduce the divergence of the algorithm, reduce invalid search and improve the efficiency of the algorithm planning. Simulation results show that the FDL algorithm is more efficient and also could maintain the same level of path quality.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115449540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of Researches on the Emotion Recognition and Affective Computing Based on HCI","authors":"Wenqian Lin, Yunjian Zhang","doi":"10.1109/ICIVC55077.2022.9886306","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886306","url":null,"abstract":"Human-computer interaction (HCI) is the third revolution of information technology after cloud computing and big data. In the design of HCI, it usually involves physical level, cognitive level and emotional level, while emotion recognition and affective computing (ERAC) are the main contents of emotional level. In this paper, the concept and function of ERAC are described; the progress of research on ERAC from facial expression, voice, text, physiological signal and other aspects are analyzed; the application of ERAC in the computer science, health care, media entertainment, intelligent equipment, education and other fields are expound. Finally, in order to provide reference and basis for further research, the problems that need to be studied and future work are prospected.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124198925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Infrared and Visible Image Fusion Based on Biological Vision","authors":"Qianqian Han, Runping Xi, Qian Chen","doi":"10.1109/ICIVC55077.2022.9887132","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9887132","url":null,"abstract":"Infrared images can acquire salient targets, while visible images contain richer details. It is vital to fuse these two types of images. Benefiting from the existence of the dual-mode cellular mechanism, the rattlesnake is able to process and fusion infrared and visible signals, improving the predatory ability. In this paper, we design an auto-encoder fusion network based on the visual adversarial receptor domain. In this network, we build a feature-level fusion strategy based on the dual-modal cell mechanism which is simulated by the human visual cell’s center-antagonistic receptor domain. Meanwhile, we optimize the feature extraction and feature reconstruction modules in fusion network. By realized the combined research of biological vision and computer vision, our network delivers a better performance than the state-of-the-art methods in both subjective and objective evaluation.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"247 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121485461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learnable Upsampling-Based Point Cloud Semantic Segmentation","authors":"Xue Xiang, Wenpeng Zong, Guangyun Li","doi":"10.1109/ICIVC55077.2022.9886287","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886287","url":null,"abstract":"The point cloud semantic segmentation network based on point-wise multi-layer perceptron (MLP) has been widely applied with its end-to-end advantages. Normally, such networks use the traditional upsampling algorithm to recover the details of point clouds in the decoding stage. However, the point cloud has rich 3D geometric information. The traditional interpolation algorithm does not consider the geometric correlation in the process of recovering the details of the point cloud, resulting in the inaccurate output point features. To this end, a learnable upsampling algorithm is proposed in this paper. This upsampling algorithm is implemented by utilizing moving least squares (MLS) and radial basis function (RBF), which can fully exploit the local geometric features of point clouds and accurately restore the details of scenarios. The validity of the proposed upsampling operator is verified on the Semantic3D dataset. Experimental results show that the proposed upsampling algorithm is superior to the widely applied traditional interpolation algorithms when used for point cloud semantic segmentation.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115877305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MITPose: Multi-Granularity Feature Interaction for Human Pose Estimation","authors":"Jiayu Zou, Jie Qin, Zhen Zhang, Xingang Wang","doi":"10.1109/ICIVC55077.2022.9887304","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9887304","url":null,"abstract":"Human pose estimation is broadly used in action recognition, Re-Identity, and multi-object tracking. Recently deep convolutional neural networks have demonstrated their great power in human pose estimation. However, CNN-based methods are limited by the constrained receptive field that has poor performance in modeling global relationships of different body parts. In this paper, we propose a novel multi-granularity feature interaction network for human pose estimation (MITPose), which exploits the multi-granularity feature interaction in global-local level features, multi-scale features, and locality features. Our MITPose can efficiently leverage the long-range representation ability of transformer net and inductive locality of convolution net to obtain the comprehensive information for key point localization and relationship modeling. Extensive experiments illustrate that our proposed MITPose achieves state-of-the-art performance on the public COCO dataset.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128343949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Approach for Smile Recognition via Deep Convolutional Neural Networks","authors":"Yuanzhu Liu, Zuoli Liu, Yong Zhao, Junli Xu","doi":"10.1109/ICIVC55077.2022.9886093","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886093","url":null,"abstract":"Smile recognition is a difficult research issue in the fields of computer vision and pattern recognition. Most of existing algorithms are only suitable for western people's smile recognition in simple backgrounds, and cannot well recognize Chinese people's smile in complex backgrounds. In order to solve this problem, we first construct a dataset composed of 4,000 western face images and 4,000 Chinese face images. Especially, 5,000 images in this dataset have complex backgrounds. Then, we use this dataset to train a convolutional neural network, a residual neural network, and a lightweight neural network for smile recognition, respectively. Various experiments show that our algorithm has a good generalization ability to recognize the smile of both western people and Chinese people robustly even in complex backgrounds.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125379626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-Supervised Semantic Segmentation of Class-Imbalanced Images: A Hierarchical Self-Attention Generative Adversarial Network","authors":"Lu Chai, Qinyuan Liu","doi":"10.1109/ICIVC55077.2022.9886496","DOIUrl":"https://doi.org/10.1109/ICIVC55077.2022.9886496","url":null,"abstract":"How to train models with unlabeled data and implement one trained model across several data sets are key problems in computer vision applications that require high-cost annotations. Recently, a generative model [1] proves its advantages in semi-supervised segmentation and out-of-domain generalization. However, this method becomes less effective when meet with class-imbalanced images whose foreground occupies small areas. To solve this problem, we introduce a hierarchical generative model with a self-attention mechanism to help with capturing features of foreground objects. Concretely, we apply a two-stage hierarchical generative model to perform image synthesis with the self-attention mechanism. Since attention maps are also semantic labels in segmentation fields, the hierarchical self-attention model can synthesize images and corresponding segmentation labels simultaneously. At test time, the segmentation is achieved by mapping input images into latent presentations with two encoders and synthesizing labels with the generative model. We evaluate our hierarchical model on three biomedical segmentation data sets. The experimental results demonstrate that our method outperforms other baselines on semi-supervised segmentation of class-imbalanced images, and meanwhile, pre-serves out-of-domain generalization ability.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117044111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}