{"title":"CLAMOT: 3D Detection and Tracking via Multi-modal Feature Aggregation","authors":"Shuo Zhang, Xiaolong Liu, Wenqi Tao","doi":"10.1145/3529446.3529451","DOIUrl":"https://doi.org/10.1145/3529446.3529451","url":null,"abstract":"In autonomous driving, multi-object tracking (MOT) can help vehicles perceive surroundings better and perform well-informed motion-planning. Methods based on LiDAR suffer from the sparsity of LiDAR points and detect only in a limited range. To this end, we propose a camera and LiDAR aggregation module named CLA-fusion to fuse the two modal features in a point-wise manner. The enhanced points can be used for extracting features through a 3D backbone. For the detection, we adopts a center-based method which means detecting the centers of objects by a keypoint detector and regressing other attributes, like 3D size, velocity, etc. In the tracking part, we use a simple but effective matching strategy, closest-point matching. According to the structure and characteristics of the whole framework, we name our model CLAMOT. Our experiments on nuScenes and Waymo benchmarks achieve competitive results.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133819605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DnT: Learning Unsupervised Denoising Transformer from Single Noisy Image","authors":"Xiaolong Liu, Yu Hong, Qifang Yin, Shuo Zhang","doi":"10.1145/3529446.3529455","DOIUrl":"https://doi.org/10.1145/3529446.3529455","url":null,"abstract":"In the last few years, a myriad of Transformer based methods have drawn considerable attention due to their outstanding performance on various computer vision tasks. However, most image denoising methods are based on convolutional neural networks (CNNs), few attempts have been made with Transformer, especially in self-supervised and unsupervised methods. In this paper, we propose a novel and good performance unsupervised image Denoising Transformer (DnT) which is just trained by the single input noisy image. Our network combines Transformer and CNN to predict the counterpart clean target, the training loss was measured by pairs of noisy independent images constructed from the input image. The dropout-based ensemble is used to get the final denoised result by averaging multiple predictions generated by the trained model. Experiments show that the proposed method not only has superior performance over the state-of-the-art single noisy image denoiser on additive white Gaussian noise (AWGN) removal but also achieves good results on real-world image denoising.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120954911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Covariance Matrix based on Blur Evaluation for Visual-Inertial Navigation","authors":"Yihao Zuo, C. Yan, Qiwei Liu, Xia Wang","doi":"10.1145/3529446.3529462","DOIUrl":"https://doi.org/10.1145/3529446.3529462","url":null,"abstract":"The covariance matrix in the current mainstream visual-inertial navigation system is artificially set and the weight of visual information cannot be adjusted by different blur degree, which cause the poor accuracy and robustness in the whole system. In order to solve this problem, this paper proposed a navigation scheme based on adaptive covariance matrix. This method used the Laplacian operator to evaluate the blur degree of image by a score. And then the visual covariance matrix is adjusted according to the different scores, which can adjust the weight in the fusion system according to the image quality. By doing this, the algorithm can improve the accuracy of the system. The simulation results show that the proposed method can effectively improve the system accuracy. Compared with the traditional method, the proposed algorithm has stronger robustness when motion blur occur.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114386906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Seismic Data Interpolation by the Projected Iterative Soft-threshold Algorithm for Tight Frame","authors":"Lin Tian, S. Qin","doi":"10.1145/3529446.3529460","DOIUrl":"https://doi.org/10.1145/3529446.3529460","url":null,"abstract":"Seismic data recovery from missing traces is a crucial step in seismic data pre-processing. Recently researches have proposed many useful methods to reconstruct the seismic data based on compressed sensing. Curvelet frames can be used to sparsely represent the seismic data volume, analysis model has been proposed to reconstruct the seismic data, however, the latest kind of discrete curvelet transform has tight frame property, the recent insights show synthetically model is more suitable for a tight frame. A synthetically model is introduced to seismic data reconstruction; projected iterative soft-threshold algorithm (pFISTA) is used to solve the model. The recovery performs well on synthetic as well as real data by the proposed method. Comparing with the analysis model solved by an iterative soft-threshold algorithm (FISTA) in the curvelet domain, the new method has improved reconstruction efficiency and reduced the computation time.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133596458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Infrared small target detection algorithm with complex background based on YOLO-NWD","authors":"Xiao Zhou, Lang Jiang, Xujun Guan, Xingang Mou","doi":"10.1145/3529446.3529448","DOIUrl":"https://doi.org/10.1145/3529446.3529448","url":null,"abstract":"Because of small number of occupied pixels, lacking shape and texture information, the reliability of infrared remote target detection has always been a difficult research topic. To improve the accuracy and precision of detection of infrared small targets under complex background conditions, a deep learning-based infrared small target detection algorithm YOLO-NWD is proposed. According to the characteristics of small and medium targets in infrared images, multi-channel feature fusion image was used as the input of YOLO detection framework combined with image preprocessing method. Combined with SE module and ASPP module, feature weights are explored to improve feature utilization efficiency. Finally, the normalized Wasserstein distance (NWD) loss is used to replace the original IoU calculation loss to reduce the sensitivity of small target position deviation. The experimental results show that the algorithm proposed in this paper improves the accuracy by 2.5% and the recall rate by 4%.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117250498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"YOLO-oil: A Real-time Transformer Fault Detector toward Small Dataset","authors":"Shaojie Hu, Xianwen Jin, Huigang Wang","doi":"10.1145/3529446.3529461","DOIUrl":"https://doi.org/10.1145/3529446.3529461","url":null,"abstract":"In electrical substations, fault detection of the transformer widely relies on human eye, which is low efficiency and costly. With under-oil robot and deep learning algorithms, the fault detection will be done without draining the transformer oil. As we know, deep learning methods for computer vision have achieved incredible results on some tasks such as object detection. However, such success greatly relies on the huge dataset, which is extremely high-cost and unavailable in some industry application. Deep learning algorithm, such as YOLO series, often fails on small dataset, and the test accuracy decreases significantly due to the neural network overfitting on the small dataset. In this paper, the YOLO-oil network for transformer fault detector based on YOLOv5 is proposed to mitigate the overfitting problem on small dataset: First, we shrink the network depth and get a light weight backbone. Second, we improved the network architecture by decoupling the detect head network. Since no open dataset exists for transformer fault detection before, the author creates a brand-new training dataset and a test dataset. Experimental results on the test set show that our algorithm achieves surprising results for the transformer fault detection task and surpasses YOLOv5, which is a great help to industry application.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127506212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Infrared dim and small target detection based on total variation and multiple noise constraints modeling","authors":"Xiaowen Wang, Xiaoyan Xia, Qiao Li, Wei Xue","doi":"10.1145/3529446.3529447","DOIUrl":"https://doi.org/10.1145/3529446.3529447","url":null,"abstract":"To improve the ability of infrared dim small target detection algorithm based on traditional infrared patch-image (IPI) model, a new detection model based on total variation and multiple noise constraints is proposed. We firstly transform the original infrared image into an IPI, and then the total variational regularization constrains the background patch-image in order to reduce the noise on the target image. In the meantime, the edge information of the image can be preserved to avoid excessive smoothness of the restored background image. Additionally, considering the lack of noise distribution in the patch-image, the combined and norm are introduced to describe the noise more accurately. The experimental results show that the proposed method can suppress the background clutter better and improve detection performance effectively.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126858767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Group Sparse-based Discriminative Feature Learning for Face Recognition","authors":"Xiaoqun Qiu, Xiaoyu Du, Liyan Deng, Zhen Chen","doi":"10.1145/3529446.3529450","DOIUrl":"https://doi.org/10.1145/3529446.3529450","url":null,"abstract":"The rapid development of facial recognition technology has brought great convenience to daily life, but also serious security risks, especially in the case of occlusion and loud noise. Faced with this limitation, this letter proposes a fast face recognition framework called a Group Sparse-based Discriminative Feature Learning (GSDFL-Net). Specifically, GSDFL-Net uses a novel unified objective function to simultaneously learn the discriminant features, sparse code and classification errors. In the proposed framework, the feature projection is incorporated into GSDFL-Net model, which reduces the classification errors. Then, we integrate denoising FFDNet into the proposed GS FL-Net model to penalize the noisy pixels, which is simultaneously learned by our unified objective function. Besides, we derive an optimization mechanism to encourage obtained learning parameters and decrease the information loss. Extensive experiments demonstrate the effectiveness of the proposed scheme under different including occlusion random noise conditions on the famous Aleix Martinez and ExYale B database.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126629022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","authors":"","doi":"10.1145/3529446","DOIUrl":"https://doi.org/10.1145/3529446","url":null,"abstract":"","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"40 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131137859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}