{"title":"An Augmented Reality Tracking Registration Method Based on Deep Learning","authors":"Xingya Yan, Guangrui Bai, Chaobao Tang","doi":"10.1145/3573942.3574034","DOIUrl":"https://doi.org/10.1145/3573942.3574034","url":null,"abstract":"Augmented reality is a three-dimensional visualization technology that can carry out human-computer interaction. Virtual information is placed in the designated area of the real world to enhance real-world information. Based on the existing implementation process of augmented reality, this paper proposes an augmented reality method based on deep learning, aiming at the inaccurate positioning and model drift of the augmented reality method without markers in complex backgrounds, light changes, and partial occlusion. The proposed method uses the lightweight SSD model for target detection, the SURF algorithm to extract feature points and the FLANN algorithm for feature matching. Experimental results show that this method can effectively solve the problems of inaccurate positioning and model drift under particular circumstances while ensuring the operational efficiency of the augmented reality system.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122299187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Na Li, Jiale Gao, Y. Liu, Yansheng Zhu, Wenhan Jiang
{"title":"Visual Correlation Filter Tracking for UAV Based on Temporal and Spatial Regularization with Boolean Maps","authors":"Na Li, Jiale Gao, Y. Liu, Yansheng Zhu, Wenhan Jiang","doi":"10.1145/3573942.3574036","DOIUrl":"https://doi.org/10.1145/3573942.3574036","url":null,"abstract":"Object tracking is now widely used in sports event broadcasting, security surveillance, and human-computer interaction. It is a challenging task for tracking on unmanned aerial vehicle (UAV) datasets due to many factors such as illumination change, appearance modification, occlusion, motion blur and so on. To solve the problem, a visual correlation filter tracking algorithm based on temporal and spatial regularization is proposed. It employs boolean maps to obtain visual attention, and fuses different features such as color names (CN), histogram of oriented gradient (HOG) and Gray features to enhance the visual representation. New object occlusion judgment method and model update strategy are put forward to make the tracker more robust. The proposed algorithm is compared with other six trackers in terms of distant precision and success rate on UAV123. And the experimental results show that it achieves more stable and robust tracking performance.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129410519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effects of PM2.5 on the Detection Performance of Quantum Interference Radar","authors":"Lihao Tian, Min Nie, Guang Yang","doi":"10.1145/3573942.3574117","DOIUrl":"https://doi.org/10.1145/3573942.3574117","url":null,"abstract":"In order to study the influence of PM2.5 particles on the detection performance of quantum interference radar, this article analyzes the relationship between the concentration of PM2.5 particles and the extinction coefficient under different particle sizes based on the spectral distribution function of PM2.5 particles and the Mie scattering theory. Then establish the influence model of PM2.5 particles on the detection distance and maximum detection error probability of quantum interference radar. The simulation results show that as the concentration of PM2.5 particles increases, the extinction coefficient of PM2.5 particles shows a gradually increasing trend; the energy of the detected photons is attenuated, resulting in a decrease in the transmission distance of the photons; when the energy of the emitted photons remains unchanged, The maximum detection error probability of quantum interference radar increases with the increase of PM2.5 particle concentration; when the PM2.5 particle concentration remains unchanged, the maximum detection error probability decreases gradually with the increase of the emitted photon energy. Therefore, the average number of emitted photons should be appropriately adjusted according to PM2.5 pollution in order to reduce the impact of PM2.5 atmospheric pollution on the detection performance of quantum interference radar.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129346892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single Image Dehazing Via Enhanced CycleGAN","authors":"Sheping Zhai, Yuanbiao Liu, Dabao Cheng","doi":"10.1145/3573942.3574097","DOIUrl":"https://doi.org/10.1145/3573942.3574097","url":null,"abstract":"Due to the influence of atmospheric light scattering, the images acquired by outdoor imaging device in haze scene will appear low definition, contrast reduction, overexposure and other visible quality degradation, which makes it difficult to handle the relevant computer vision tasks. Therefore, image dehazing has become an important research area of computer vision. However, existing dehazing methods generally require paired image datasets that include both hazy images and corresponding ground truth images, while the recovered images are easy to occur color distortion and detail loss. In this study, an end-to-end image dehazing method based on Cycle-consistent Generative Adversarial Networks (CycleGAN) is proposed. For effectively learning the mapping relationship between hazy images and clear images, we refine the transformation module of the generator by weighting optimization, which can promote the network adaptability to scale. Then in order to further improve the quality of generated images, the enhanced perceptual loss and low-frequency loss combined with image feature attributes are constructed in the overall optimization objective of the network. The experimental results show that our dehazing algorithm effectively recovers the texture information while correcting the color distortion of original CycleGAN, and the recovery effect is clear and more natural, which better reduces the influence of haze on the imaging quality.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129490217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuquan Gan, Wenqiang Li, Y. Liu, Jinglu He, Ji Zhang
{"title":"Hyperspectral Anomaly Detection based on Autoencoder using Superpixel Manifold Constraint","authors":"Yuquan Gan, Wenqiang Li, Y. Liu, Jinglu He, Ji Zhang","doi":"10.1145/3573942.3574108","DOIUrl":"https://doi.org/10.1145/3573942.3574108","url":null,"abstract":"In the field of hyperspectral anomaly detection, autoencoder (AE) have become a hot research topic due to their unsupervised characteristics and powerful feature extraction capability. However, autoencoders do not keep the spatial structure information of the original data well during the training process, and is affected by anomalies, resulting in poor detection performance. To address these problems, a hyperspectral anomaly detection method based on autoencoders with superpixel manifold constraints is proposed. Firstly, superpixel segmentation technique is used to obtain the superpixels of the hyperspectral image, and then the manifold learning method is used to learn the embedded manifold that based on the superpixels. Secondly, the learned manifold constraints are embedded in the autoencoder to learn the potential representation, which can maintain the consistency of the local spatial and geometric structure of the hyperspectral images (HSI). Finally, anomalies are detected by computing reconstruction errors of the autoencoder. Extensive experiments are conducted on three datasets, and the experimental results show that the proposed method has better detection performance than other hyperspectral anomaly detectors.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123665287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Dialogue Generation Based on Transformer and Collaborative Attention","authors":"Wei Guan, Zhen Zhang, Li Ma","doi":"10.1145/3573942.3574091","DOIUrl":"https://doi.org/10.1145/3573942.3574091","url":null,"abstract":"In view of the fact that the current multimodal dialogue generation models are based on a single image for question-and-answer dialogue generation, the image information cannot be deeply integrated into the sentences, resulting in the inability to generate semantically coherent, informative visual contextual dialogue responses, which further limits the application of multimodal dialogue generation models in real scenarios. This paper proposes a Deep Collaborative Attention Model (DCAN) method for multimodal dialogue generation tasks. First, the method globally encode the dialogue context and its corresponding visual context information respectively; second, to guide the simultaneous learning of interactions between image and text multimodal representations, after the visual context features are fused with the dialogue context features through the collaborative attention mechanism, the hadamard product is used to fully fuse the multimodal features again to improve the network performance; finally, the fused features are fed into a transformer-based decoder to generate coherent, informative responses. in order to solve the problem of continuous dialogue in multimodal dialogue, the method of this paper uses the OpenVidial2.0 data set to conduct experiments. The results show that the responses generated by this model have higher correlation and diversity than existing comparison models, and it can effectively integrate visual context information.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114528444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Voicifier-LN: An Novel Approach to Elevate the Speaker Similarity for General Zero-shot Multi-Speaker TTS","authors":"Dengfeng Ke, Liangjie Huang, Wenhan Yao, Ruixin Hu, Xueyin Zu, Yanlu Xie, Jinsong Zhang","doi":"10.1145/3573942.3574120","DOIUrl":"https://doi.org/10.1145/3573942.3574120","url":null,"abstract":"Speeches generated from neural network-based Text-to-Speech (TTS) have been becoming more natural and intelligible. However, the evident dropping performance still exists when synthesizing multi-speaker speeches in zero-shot manner, especially for those from different countries with different accents. To bridge this gap, we propose a novel method, called Voicifier. It firstly operates on high frequency mel-spectrogram bins to approximately remove the content and rhythm. Then Voicifier uses two strategies, from the shallow to the deep mixing, to further destroy the content and rhythm but retain the timbre. Furthermore, for better zero-shot performance, we propose Voice-Pin Layer Normalization (VPLN) which pins down the timbre according with the text feature. During inference, the model is allowed to synthesize high quality and similarity speeches with just around 1 sec target speech audio. Experiments and ablation studies prove that the methods are able to retain more target timbre while abandoning much more of the content and rhythm-related information. To our best knowledge, the methods are found to be universal that is to say it can be applied to most of the existing TTS systems to enhance the ability of cross-speaker synthesis.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114620721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incremental Encoding Transformer Incorporating Common-sense Awareness for Conversational Sentiment Recognition","authors":"Xiao Yang, Xiaopeng Cao, Hao Liang","doi":"10.1145/3573942.3573965","DOIUrl":"https://doi.org/10.1145/3573942.3573965","url":null,"abstract":"Conversational sentiment recognition has been widely used in people's lives and work. However, machines do not understand emotions through common-sense cognition. We propose an Incremental Encoding Transformer Incorporating Common-sense Awareness (IETCA) model. The model helps the machines use common-sense knowledge to better understand emotions in conversation. The model uses a context-aware graph attention mechanism to obtain knowledge-rich utterance representations and uses an incremental encoding Transformer to get rich contextual representations. We do some experiments on five datasets. The results show that the model has some improvement in conversational sentiment recognition.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"169 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113987211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Dual-Task Deep Neural Network for Scene and Action Recognition Based on 3D SENet and 3D SEResNet","authors":"Zhouzhou Wei, Yuelei Xiao","doi":"10.1145/3573942.3574077","DOIUrl":"https://doi.org/10.1145/3573942.3574077","url":null,"abstract":"Aiming at the problem that scene information will become noise and cause interference in the feature extraction stage of action recognition, a dual-task deep neural network model for scene and action recognition is proposed. The model first uses a convolutional layer and max pooling layer as shared layers to extract low-dimensional features, then uses 3D SEResNet for action recognition and 3D SENet for scene recognition, and finally outputs their respective results. In addition, to solve the problem that the existing public dataset is not associated with the scene, a scene and action dataset (SAAD) for recognition is built by ourselves. Experimental results show that our method performs better than other methods on SAAD dataset.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127736322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural Network Prediction Model Based on Differential Localization","authors":"Yuanhua Liu, Ruini Li, Xinliang Niu","doi":"10.1145/3573942.3573960","DOIUrl":"https://doi.org/10.1145/3573942.3573960","url":null,"abstract":"The Global Navigation Satellite System-Reflectometry (GNSS-R) is affected by buildings, trees, etc. during the transmission process, which generates large errors. The traditional method is to use differential to eliminate most of the errors to improve positioning accuracy. In this paper, a neural network prediction model based on differential results is proposed, which uses the differential results X, Y and Z as the inputs of the neural network to predict the satellite position, and finally compare it with the real value. The paper uses Artificial Neural Network (ANN), Recurrent Neural Network (RNN) and Long Short Term Memory-Recurrent Neural Network (LSTM-RNN) are used to establish training models and make predictions. The results show that compared with the ANN model, the Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) of the RNN model are reduced by 1.54% and 3.59%, respectively; compared with the RNN model, the MAPE and RMSE of the LSTM-RNN model are reduced by 21.16% and 14.81%, respectively, which proves that the training accuracy and fit of the LSTM-RNN are better.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126429399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}