IET Computer Vision最新文献_第4页

Outliers rejection for robust camera pose estimation using graduated non-convexity

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-23 DOI: 10.1049/cvi2.12330

Hao Yi, Bo Liu, Bin Zhao, Enhai Liu

{"title":"Outliers rejection for robust camera pose estimation using graduated non-convexity","authors":"Hao Yi, Bo Liu, Bin Zhao, Enhai Liu","doi":"10.1049/cvi2.12330","DOIUrl":"https://doi.org/10.1049/cvi2.12330","url":null,"abstract":"Camera pose estimation plays a crucial role in computer vision, which is widely used in augmented reality, robotics and autonomous driving. However, previous studies have neglected the presence of outliers in measurements, so that even a small percentage of outliers will significantly degrade precision. In order to deal with outliers, this paper proposes using a graduated non-convexity (GNC) method to suppress outliers in robust camera pose estimation, which serves as the core of GNCPnP. The authors first reformulate the camera pose estimation problem using a non-convex cost, which is less affected by outliers. Then, to apply a non-minimum solver to solve the reformulated problem, the authors use the Black-Rangarajan duality theory to transform it. Finally, to address the dependence of non-convex optimisation on initial values, the GNC method was customised according to the truncated least squares cost. The results of simulation and real experiments show that GNCPnP can effectively handle the interference of outliers and achieve higher accuracy compared to existing state-of-the-art algorithms. In particular, the camera pose estimation accuracy of GNCPnP in the case of a low percentage of outliers is almost comparable to that of the state-of-the-art algorithm in the case of no outliers.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143363007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Weakly supervised bounding-box generation for camera-trap image based animal detection

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-20 DOI: 10.1049/cvi2.12332

Puxuan Xie, Renwu Gao, Weizeng Lu, Linlin Shen

{"title":"Weakly supervised bounding-box generation for camera-trap image based animal detection","authors":"Puxuan Xie, Renwu Gao, Weizeng Lu, Linlin Shen","doi":"10.1049/cvi2.12332","DOIUrl":"https://doi.org/10.1049/cvi2.12332","url":null,"abstract":"In ecology, deep learning is improving the performance of camera-trap image based wild animal analysis. However, high labelling cost becomes a big challenge, as it requires involvement of huge human annotation. For example, the Snapshot Serengeti (SS) dataset contains over 900,000 images, while only 322,653 contains valid animals, 68,000 volunteers were recruited to provide image level labels such as species, the no. of animals and five behaviour attributes such as standing, resting and moving etc. In contrast, the Gold Standard SS Bounding-Box Coordinates (GSBBC for short) contains only 4011 images for training of object detection algorithms, as the annotation of bounding-box for animals in the image, is much more costive. Such a no. of training images, is obviously insufficient. To address this, the authors propose a method to generate bounding-boxes for a larger dataset using limited manually labelled images. To achieve this, the authors first train a wild animal detector using a small dataset (e.g. GSBBC) that is manually labelled to locate animals in images; then apply this detector to a bigger dataset (e.g. SS) for bounding-box generation; finally, we remove false detections according to the existing label information of the images. Experiments show that detector trained with images whose bounding-boxes are generated using the proposal, outperformed the existing camera-trap image based animal detection, in terms of mean average precision (mAP). Compared with the traditional data augmentation method, our method improved the mAP by 21.3% and 44.9% for rare species, also alleviating the long-tail issue in data distribution. In addition, detectors trained with the proposed method also achieve promising results when applied to classification and counting tasks, which are commonly required in wildlife research.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12332","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143363031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Guest Editorial: Anomaly detection and open-set recognition applications for computer vision

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-19 DOI: 10.1049/cvi2.12329

Hakan Cevikalp, Robi Polikar, Ömer Nezih Gerek, Songcan Chen, Chuanxing Geng

{"title":"Guest Editorial: Anomaly detection and open-set recognition applications for computer vision","authors":"Hakan Cevikalp, Robi Polikar, Ömer Nezih Gerek, Songcan Chen, Chuanxing Geng","doi":"10.1049/cvi2.12329","DOIUrl":"https://doi.org/10.1049/cvi2.12329","url":null,"abstract":"Anomaly detection is a method employed to identify data points or patterns that significantly deviate from expected or normal behaviour within a dataset. This approach aims to detect observations regarded as unusual, erroneous, anomalous, rare, or potentially indicative of fraudulent or malicious activity. Open-set recognition, also referred to as open-set identification or open-set classification, is a pattern recognition task that extends traditional classification by addressing the presence of unknown or novel classes during the testing phase. This approach highlights a strong connection between anomaly detection and open-set recognition, as both seek to identify samples originating from unknown classes or distributions. Open-set recognition methods frequently involve modelling both known and unknown classes during training, allowing for the capture of the distribution of known classes while explicitly addressing the space of unknown classes. Techniques in open-set recognition may include outlier detection, density estimation, or configuring decision boundaries to better differentiate between known and unknown classes. This special issue calls for original contributions introducing novel datasets, innovative architectures, and advanced training methods for tasks related to visual anomaly detection and open-set recognition.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1069-1071"},"PeriodicalIF":1.5,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12329","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Autoencoder-based unsupervised one-class learning for abnormal activity detection in egocentric videos

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-19 DOI: 10.1049/cvi2.12333

Haowen Hu, Ryo Hachiuma, Hideo Saito

{"title":"Autoencoder-based unsupervised one-class learning for abnormal activity detection in egocentric videos","authors":"Haowen Hu, Ryo Hachiuma, Hideo Saito","doi":"10.1049/cvi2.12333","DOIUrl":"https://doi.org/10.1049/cvi2.12333","url":null,"abstract":"In recent years, abnormal human activity detection has become an important research topic. However, most existing methods focus on detecting abnormal activities of pedestrians in surveillance videos; even those methods using egocentric videos deal with the activities of pedestrians around the camera wearer. In this paper, the authors present an unsupervised auto-encoder-based network trained by one-class learning that inputs RGB image sequences recorded by egocentric cameras to detect abnormal activities of the camera wearers themselves. To improve the performance of network, the authors introduce a ‘re-encoding’ architecture and a regularisation loss function term, minimising the KL divergence between the distributions of features extracted by the first and second encoders. Unlike the common use of KL divergence loss to obtain a feature distribution close to an already-known distribution, the aim is to encourage the features extracted by the second encoder to have a close distribution to those extracted from the first encoder. The authors evaluate the proposed method on the Epic-Kitchens-55 dataset and conduct an ablation study to analyse the functions of different components. Experimental results demonstrate that the method outperforms the comparison methods in all cases and demonstrate the effectiveness of the proposed re-encoding architecture and the regularisation term.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Metric-guided class-level alignment for domain adaptation

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-15 DOI: 10.1049/cvi2.12322

Xiaoshun Wang, Yunhan Li

{"title":"Metric-guided class-level alignment for domain adaptation","authors":"Xiaoshun Wang, Yunhan Li","doi":"10.1049/cvi2.12322","DOIUrl":"https://doi.org/10.1049/cvi2.12322","url":null,"abstract":"The utilisation of domain adaptation methods facilitates the resolution of classification challenges in an unlabelled target domain by capitalising on the labelled information from source domains. Unfortunately, previous domain adaptation methods have focused mostly on global domain adaptation and have not taken into account class-specific data, which leads to poor knowledge transfer performance. The study of class-level domain adaptation, which aims to precisely match the distributions of different domains, has garnered attention in recent times. However, existing investigations into class-level alignment frequently align domain features either directly on or in close proximity to classification boundaries, resulting in the creation of uncertain samples that could potentially impair classification accuracy. To address the aforementioned problem, we propose a new approach called metric-guided class-level alignment (MCA) as a solution to this problem. Specifically, we employ different metrics to enable the network to acquire supplementary information, thereby enhancing class-level alignment. Moreover, MCA can be effectively combined with existing domain-level alignment methods to successfully mitigate the challenges posed by domain shift. Extensive testing on commonly-used public datasets shows that our method outperforms many other cutting-edge domain adaptation methods, showing significant gains over baseline performance.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12322","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Representation alignment contrastive regularisation for multi-object tracking

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-15 DOI: 10.1049/cvi2.12331

Shujie Chen, Zhonglin Liu, Jianfeng Dong, Xun Wang, Di Zhou

{"title":"Representation alignment contrastive regularisation for multi-object tracking","authors":"Shujie Chen, Zhonglin Liu, Jianfeng Dong, Xun Wang, Di Zhou","doi":"10.1049/cvi2.12331","DOIUrl":"https://doi.org/10.1049/cvi2.12331","url":null,"abstract":"Achieving high-performance in multi-object tracking algorithms heavily relies on modelling spatial-temporal relationships during the data association stage. Mainstream approaches encompass rule-based and deep learning-based methods for spatial-temporal relationship modelling. While the former relies on physical motion laws, offering wider applicability but yielding suboptimal results for complex object movements, the latter, though achieving high-performance, lacks interpretability and involves complex module designs. This work aims to simplify deep learning-based spatial-temporal relationship models and introduce interpretability into features for data association. Specifically, a lightweight single-layer transformer encoder is utilised to model spatial-temporal relationships. To make features more interpretative, two contrastive regularisation losses based on representation alignment are proposed, derived from spatial-temporal consistency rules. By applying weighted summation to affinity matrices, the aligned features can seamlessly integrate into the data association stage of the original tracking workflow. Experimental results showcase that our model enhances the majority of existing tracking networks' performance without excessive complexity, with minimal increase in training overhead and nearly negligible computational and storage costs.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12331","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid feature-based moving cast shadow detection

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-13 DOI: 10.1049/cvi2.12328

Jiangyan Dai, Huihui Zhang, Jin Gao, Chunlei Chen, Yugen Yi

{"title":"Hybrid feature-based moving cast shadow detection","authors":"Jiangyan Dai, Huihui Zhang, Jin Gao, Chunlei Chen, Yugen Yi","doi":"10.1049/cvi2.12328","DOIUrl":"https://doi.org/10.1049/cvi2.12328","url":null,"abstract":"The accurate detection of moving objects is essential in various applications of artificial intelligence, particularly in the field of intelligent surveillance systems. However, the moving cast shadow detection significantly decreases the precision of moving object detection because they share similar motion characteristics. To address the issue, the authors propose an innovative approach to detect moving cast shadows by combining the hybrid feature with a broad learning system (BLS). The approach involves extracting low-level features from the input and background images based on colour constancy and texture consistency principles that are shown to be highly effective in moving cast shadow detection. The authors then utilise the BLS to create a hybrid feature and BLS uses the extracted low-level features as input instead of the original data. BLS is an innovative form of deep learning that can map input to feature nodes and further enhance them by enhancement nodes, resulting in more compact features for classification. Finally, the authors develop an efficient and straightforward post-processing technique to improve the accuracy of moving object detection. To evaluate the effectiveness and generalisation ability, the authors conduct extensive experiments on public ATON-CVRR and CDnet datasets to verify the superior performance of our method by comparing with representative approaches.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12328","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High precision light field image depth estimation via multi-region attention enhanced network

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-10 DOI: 10.1049/cvi2.12326

Jie Li, Wenxuan Yang, Chuanlun Zhang, Heng Li, Xinjia Li, Lin Wang, Yanling Wang, Xiaoyan Wang

{"title":"High precision light field image depth estimation via multi-region attention enhanced network","authors":"Jie Li, Wenxuan Yang, Chuanlun Zhang, Heng Li, Xinjia Li, Lin Wang, Yanling Wang, Xiaoyan Wang","doi":"10.1049/cvi2.12326","DOIUrl":"https://doi.org/10.1049/cvi2.12326","url":null,"abstract":"Light field (LF) depth estimation is a key task with numerous practical applications. However, achieving high-precision depth estimation in challenging scenarios, such as occlusions and detailed regions (e.g. fine structures and edges), remains a significant challenge. To address this problem, the authors propose a LF depth estimation network based on multi-region selection and guided optimisation. Firstly, we construct a multi-region disparity selection module based on angular patch, which selects specific regions for generating angular patch, achieving representative sub-angular patch by balancing different regions. Secondly, different from traditional guided deformable convolution, the guided optimisation leverages colour prior information to learn the aggregation of sampling points, which enhances the deformable convolution ability by learning deformation parameters and fitting irregular windows. Finally, to achieve high-precision LF depth estimation, the authors have developed a network architecture based on the proposed multi-region disparity selection and guided optimisation module. Experiments demonstrate the effectiveness of network on the HCInew dataset, especially in handling occlusions and detailed regions.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1390-1406"},"PeriodicalIF":1.5,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12326","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DPANet: Position-aware feature encoding and decoding for accurate large-scale point cloud semantic segmentation

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-12-05 DOI: 10.1049/cvi2.12325

Haoying Zhao, Aimin Zhou

{"title":"DPANet: Position-aware feature encoding and decoding for accurate large-scale point cloud semantic segmentation","authors":"Haoying Zhao, Aimin Zhou","doi":"10.1049/cvi2.12325","DOIUrl":"https://doi.org/10.1049/cvi2.12325","url":null,"abstract":"Due to the scattered, unordered, and unstructured nature of point clouds, it is challenging to extract local features. Existing methods tend to design redundant and less-discriminative spatial feature extraction methods in the encoder, while neglecting the utilisation of uneven distribution in the decoder. In this paper, the authors fully exploit the characteristics of the imbalanced distribution in point clouds and design our Position-aware Encoder (PAE) module and Position-aware Decoder (PAD) module. In the PAE module, the authors extract position relationships utilising both Cartesian coordinate system and polar coordinate system to enhance the distinction of features. In the PAD module, the authors recognise the inherent positional disparities between each point and its corresponding upsampled point, utilising these distinctions to enrich features and mitigate information loss. The authors conduct extensive experiments and compare the proposed DPANet with existing methods on two benchmarks S3DIS and Semantic3D. The experimental results demonstrate that the method outperforms the state-of-the-art approaches.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1376-1389"},"PeriodicalIF":1.5,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12325","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reducing overfitting in vehicle recognition by decorrelated sparse representation regularisation

IF 1.5 4区计算机科学

IET Computer Vision Pub Date : 2024-11-30 DOI: 10.1049/cvi2.12320

Wanyu Wei, Xinsha Fu, Siqi Ma, Yaqiao Zhu, Ning Lu

{"title":"Reducing overfitting in vehicle recognition by decorrelated sparse representation regularisation","authors":"Wanyu Wei, Xinsha Fu, Siqi Ma, Yaqiao Zhu, Ning Lu","doi":"10.1049/cvi2.12320","DOIUrl":"https://doi.org/10.1049/cvi2.12320","url":null,"abstract":"Most state-of-the-art vehicle recognition methods benefit from the excellent feature extraction capabilities of convolutional neural networks (CNNs), which allow the models to perform well on the intra-dataset. However, they often show poor generalisation when facing cross-datasets due to the overfitting problem. For this issue, numerous studies have shown that models do not generalise well in new scenarios due to the high correlation between the representations in CNNs. Furthermore, over-parameterised CNNs have a large number of redundant representations. Therefore, we propose a novel Decorrelated Sparse Representation (DSR) regularisation. (1) It tries to minimise the correlation between feature maps to obtain decorrelated representations. (2) It forces the convolution kernels to extract meaningful features by allowing the sparse kernels to have additional optimisation. The DSR regularisation encourages diverse representations to reduce overfitting. Meanwhile, DSR can be applied to a wide range of vehicle recognition methods based on CNNs, and it does not require additional computation in the testing phase. In the experiments, DSR performs better than the original model on the intra-dataset and cross-dataset. Through ablation analysis, we find that DSR can drive the model to focus on the essential differences among all kinds of vehicles.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1351-1361"},"PeriodicalIF":1.5,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12320","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0