Xuehui Li, Yongjun Zhang, Yi Zhang, Dian-xi Shi, Huachi Xu
{"title":"Object Tracking Algorithm for Siamese Network Combined with Channel Attention Mechanism","authors":"Xuehui Li, Yongjun Zhang, Yi Zhang, Dian-xi Shi, Huachi Xu","doi":"10.1145/3529466.3529476","DOIUrl":"https://doi.org/10.1145/3529466.3529476","url":null,"abstract":"As an important branch in the field of computer vision, object tracking has been widely used in many fields such as intelligent video surveillance, human-computer interaction and autonomous driving. Although object tracking has imposing development in recent years, tracking in the complex environment is still a challenge. Due to problems such as occlusion, object deformation, and illumination change, tracking performance will be inaccurate and unstable. In this paper, an object tracking algorithm for Siamese network combined with channel attention mechanism is proposed. Firstly, the Siamese network is used to improve the ability to discriminate features; secondly, the channel attention mechanism is introduced to design a cross correlation module DCAM (Depth-wise Cross-correlation with Attention Mechanism, DCAM), which pays more attention to the features that are beneficial to the tracking results; finally, the stochastic weight averaging method is used to train the network to further improve the overall performance of the tracker. Experimental results on public data sets show that the proposed algorithm has higher accuracy and more stable tracking performance in complex tracking environment","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126042941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Traffic Sign Detection in Complex Environment based on Multi-Scale Feature Enhancement and Group Attention","authors":"JinFei Fu, Yinghua Zhou","doi":"10.1145/3529466.3529502","DOIUrl":"https://doi.org/10.1145/3529466.3529502","url":null,"abstract":"Since traffic light detection is essential for autonomous driving, it is studied intensively. However, traffic sign detection is difficult, especially in a complex environment. The traffic signs should be located first. Their unique features should be extracted next and fed into the classifier subsequently. In this paper, we adopt the current mainstream deep neural network-based object detection method for traffic sign detection. In our work, add specific environmental noise features to the dataset. A lightweight network, YOLOv4-Tiny, is chosen as the baseline network, and a multi-scale feature fusion module is designed to improve the performance of the network model. A lightweight group attention module is also designed. Experiments are carried out using the GTSDB dataset and the result shows the proposed model outperforms the other models in terms of precision and mAP.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126221676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Atmospheric Correction for Polarimetric Images Based on Spectral Segregation","authors":"Pu Xia, Xiaolai Chen, Zhaohuan Tang","doi":"10.1145/3529466.3529479","DOIUrl":"https://doi.org/10.1145/3529466.3529479","url":null,"abstract":"In hazy weather, light's penetration power is wavelength related, the longer wavelength, the less attenuation. Although traditional polarimetric image-dehazing algorithms have demonstrated their ability in enhancing grayscale images, but their ignorance of the spectral difference will lead to serious color distortion when utilizing these algorithms for color images. To conquer that problem, we propose a new method base on spectral segregation. 15 spectral bands are selected and dehazed with the polarimetric dehazing algorithm separately to obtain the best dehazing effects. The blue, green and red channels of the dehazed image, which are acquired through image fusion of the spectral bands, are adjusted with different coefficients to correct the color distortion. 10 infrared bands are added to the short-wavelength channels to enhance the details of the objects especially the trees. Experiment and data analysis demonstrate the effectiveness of our method in increasing visibility and preserving color information. The amount of color distortion can be reduced by 89.6% compared with the polarimetric image-dehazing algorithm without spectral segregation.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124708271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-view Network with Transformer for Point Cloud Semantic Segmentation","authors":"Zhongwei Hua, Daming Du","doi":"10.1145/3529466.3529504","DOIUrl":"https://doi.org/10.1145/3529466.3529504","url":null,"abstract":"The input of most point cloud semantic segmentation networks is the reconstructed complete point cloud, but in practical application scenarios, the vision devices often capture single frame point cloud data. In order to better adapt to the actual segmentation requirements in dynamic scenes, this paper proposes an online incremental point cloud semantic segmentation method, which inputs the existing saved point cloud and the currently captured point cloud into the network to make up for the lack of information in the single frame point cloud. The Transformer structure is added to the network to strengthen the fusion of contextual information. Triple Loss is introduced in the feature space to distinguish different types of point clouds in a fine-grained manner. The experimental results show that compared with the benchmark MCPNet model, the proposed semantic segmentation model improves mIoU by 2.8% and mAcc by 7% on the S3DIS Area5 dataset, further improving the accuracy of point cloud semantic segmentation.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127343436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech Emotion Recognition Exploiting ASR-based and Phonological Knowledge Representations","authors":"Shuang Liang, Xiang Xie, Qingran Zhan, Hao-bo Cheng","doi":"10.1145/3529466.3529488","DOIUrl":"https://doi.org/10.1145/3529466.3529488","url":null,"abstract":"Speech emotion recognition (SER) is a challenging problem due to the insufficient dataset. This paper deals with this problem from two aspects. First, we exploit two levels of speech representations for SER task, one for automatic speech recognition (ASR)-based representations and the other for phonological knowledge representations. Second, we use transfer learning, pre-train models and transfer knowledge from other large corpus for none-SER task. In our system, the whole model is divided into two parts: two-representation learning module and SER module. We fuse acoustic features with ASR-based and phonological knowledge representations which are both extracted from pre-trained models, and the fusion features are used in SER training. Then a novel multi-task learning approach is proposed where a shared encoder-multi decoder model is used for the phonological knowledge representation learning. The Conformer structure is introduced for the SER task, and our study indicates that Conformer is effective for SER. Finally, experimental results on IEMOCAP show that the proposed method can achieve 77.35 weighted accuracy and 77.99 unweighted accuracy respectively.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116839217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hard-Lite SLAM: A Hybrid Detector Based Real-Time SLAM System","authors":"Chengying Cai, Jichao Jiao, Wei Xu, Mingliang Pang, Jianye Dong","doi":"10.1145/3529466.3529473","DOIUrl":"https://doi.org/10.1145/3529466.3529473","url":null,"abstract":"Simultaneous Localization and Mapping (SLAM) system is essential for autonomous driving and mobile robots. The problem of data association between features becomes a bottleneck limiting the performance of traditional visual SLAM systems, especially in complex environments. Therefore, many studies combine the SLAM system with the Convolutional Neural Networks (CNN) to obtain a more robust data association. This paper shows that CNN-based local descriptors significantly improve the accuracy and robustness of the SLAM system. The CNN-based keypoints reduce the performance of the SLAM algorithm in many scenarios. We propose a SLAM system that combines hand-crafted keypoints with CNN local descriptors. The system is more robust in complex environments than traditional visual SLAM systems. The experimental results show that our system achieves higher localization accuracy than ORB-SLAM2 and VINS-Mono on the evaluated datasets. Meanwhile, the CNN local descriptors can be combined with any visual SLAM system and have good portability. Furthermore, with the assistant of the Nvidia TensorRT inference acceleration technology, the system can run in real-time on the Jetson AGX Xavier at 27 frames per second. CCS CONCEPTS • Computer systems organization • Embedded and cyber-physical systems • Robotics • Robotic autonomy","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124552420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPB-UNet++: Semantic Segmentation for Remote Sensing Images of reservoir area via Improved UNet++ with FPN","authors":"Kaiyue Wang, Xiaoye Fan, Q. Wang","doi":"10.1145/3529466.3529483","DOIUrl":"https://doi.org/10.1145/3529466.3529483","url":null,"abstract":"In order to improve the accuracy of semantic segmentation of remote sensing images in the reservoir area, this paper improves UNet ++, and proposes a UNet ++ semantic segmentation network model fused with feature pyramid network, called FPB-UNet ++. First, in order to fully extract the semantic information of different scales and enhance the recovery ability of the spatial information of remote sensing images, this paper uses the improved feature pyramid structure as the basic unit of the UNet ++ coding structure. Then, the pooling of position information will be lost between each coding unit To remove the layer, use convolution instead. Finally, in order to make full use of multi-scale feature information in the multi-sided output part, all the side output feature maps are stitched and fused in the channel dimension. Through experiments on the open and self-built remote sensing image semantic segmentation data set of Xiaolangdi Reservoir area, the results show that the network model has a good segmentation effect on feature information.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129996981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Event causality extraction based on Transformer sequence annotation model","authors":"Zefeng Xie, Shengwu Xiong","doi":"10.1145/3529466.3529481","DOIUrl":"https://doi.org/10.1145/3529466.3529481","url":null,"abstract":"Text data such as research reports and announcements in the financial field contain a large amount of event causality that can be extracted and thus applied to downstream tasks such as prediction and Q&A. Traditional event causality extraction methods extract through sentence templates, which cannot cope with multiple pairs of causality in a sentence. This paper considers the event causality extraction task as a sequential annotation task. The event causality labels are divided into ”core noun in the cause”, ”predicate or state in the cause”, ”central word”, ”core noun in result”, and ”predicate or state in result”. We proposed using the Transformer sequence annotation model based on lexicon matching to identify and extract event causality. The F1 value of the Transformer model reaches 58.70 %, and the F1 of BERT+Transformer comes the highest, 69.49 %.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"48 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133847541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Graph Based Approach Towards Exploiting Reviews for Recommendation","authors":"Bo Kong, Caiyan Jia","doi":"10.1145/3529466.3529499","DOIUrl":"https://doi.org/10.1145/3529466.3529499","url":null,"abstract":"Textual reviews, pervasive on many e-commerce websites, contain a lot of information. Many neural network models have been proposed to use the information of reviews to improve the performance of recommender systems. However, existing models usually use convolutional neural networks to learn the features of the reviews, often focus on the local interactions of words and lack the ability to capture long-distance and non-consecutive word interactions. Meanwhile, their ability should be strengthened on modelling the high-level interactions between users and items. Therefore, we propose a multi-view Graph based Approach towards exploiting Reviews for recommendation (GAR). It integrates the information of review content and user-item graph. In review view, we build an individual word co-occurrence graph for each review and use gated graph convolutional network to learn the features of reviews. In graph view, we use graph attention network to model high-order multi-aspect relations in the user-item graph. Both views use a graph based method. The representation of users and items learned from the two views are integrated to predict the final rating. Experiments on the benchmark datasets show that GAR achieves significantly better rating prediction accuracy compared to the state-of-the-art methods.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125331947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating federated learning based on grouping aggregation in heterogeneous edge computing","authors":"Longbo Li, C. Li","doi":"10.1145/3529466.3529505","DOIUrl":"https://doi.org/10.1145/3529466.3529505","url":null,"abstract":"Recently, edge devices such as mobile phones and smartwatches have become part of modern distributed systems, federated learning is an effectively distributed learning paradigm that can leverage these edge devices to collaboratively train models without sharing raw data. In federated learning, the device periodically downloads the model from the server, uses the local data for training, and uploads it to the server, while the servers aggregates params uploaded to update the global model. However, different devices are located in different network environments and have different communication and computation capability. Therefore, the model training speed depends on the slowest device, and the system between devices is heterogeneous. To effectively address these problems, we propose to group the devices, firstly use the synchronous method to aggregate model updates within a group, then aggregate updates between groups in an asynchronous way, and propose an algorithm based on weight update to aggregate models. We conduct extensive simulations on our proposed algorithms, and the results show that they can dramatically accelerate model training while achieving high accuracy.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125434086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}