{"title":"Autoencoder-based unsupervised one-class learning for abnormal activity detection in egocentric videos","authors":"Haowen Hu, Ryo Hachiuma, Hideo Saito","doi":"10.1049/cvi2.12333","DOIUrl":"10.1049/cvi2.12333","url":null,"abstract":"<p>In recent years, abnormal human activity detection has become an important research topic. However, most existing methods focus on detecting abnormal activities of pedestrians in surveillance videos; even those methods using egocentric videos deal with the activities of pedestrians around the camera wearer. In this paper, the authors present an unsupervised auto-encoder-based network trained by one-class learning that inputs RGB image sequences recorded by egocentric cameras to detect abnormal activities of the camera wearers themselves. To improve the performance of network, the authors introduce a ‘re-encoding’ architecture and a regularisation loss function term, minimising the KL divergence between the distributions of features extracted by the first and second encoders. Unlike the common use of KL divergence loss to obtain a feature distribution close to an already-known distribution, the aim is to encourage the features extracted by the second encoder to have a close distribution to those extracted from the first encoder. The authors evaluate the proposed method on the Epic-Kitchens-55 dataset and conduct an ablation study to analyse the functions of different components. Experimental results demonstrate that the method outperforms the comparison methods in all cases and demonstrate the effectiveness of the proposed re-encoding architecture and the regularisation term.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Metric-guided class-level alignment for domain adaptation","authors":"Xiaoshun Wang, Yunhan Li","doi":"10.1049/cvi2.12322","DOIUrl":"10.1049/cvi2.12322","url":null,"abstract":"<p>The utilisation of domain adaptation methods facilitates the resolution of classification challenges in an unlabelled target domain by capitalising on the labelled information from source domains. Unfortunately, previous domain adaptation methods have focused mostly on global domain adaptation and have not taken into account class-specific data, which leads to poor knowledge transfer performance. The study of class-level domain adaptation, which aims to precisely match the distributions of different domains, has garnered attention in recent times. However, existing investigations into class-level alignment frequently align domain features either directly on or in close proximity to classification boundaries, resulting in the creation of uncertain samples that could potentially impair classification accuracy. To address the aforementioned problem, we propose a new approach called metric-guided class-level alignment (MCA) as a solution to this problem. Specifically, we employ different metrics to enable the network to acquire supplementary information, thereby enhancing class-level alignment. Moreover, MCA can be effectively combined with existing domain-level alignment methods to successfully mitigate the challenges posed by domain shift. Extensive testing on commonly-used public datasets shows that our method outperforms many other cutting-edge domain adaptation methods, showing significant gains over baseline performance.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12322","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shujie Chen, Zhonglin Liu, Jianfeng Dong, Xun Wang, Di Zhou
{"title":"Representation alignment contrastive regularisation for multi-object tracking","authors":"Shujie Chen, Zhonglin Liu, Jianfeng Dong, Xun Wang, Di Zhou","doi":"10.1049/cvi2.12331","DOIUrl":"10.1049/cvi2.12331","url":null,"abstract":"<p>Achieving high-performance in multi-object tracking algorithms heavily relies on modelling spatial-temporal relationships during the data association stage. Mainstream approaches encompass rule-based and deep learning-based methods for spatial-temporal relationship modelling. While the former relies on physical motion laws, offering wider applicability but yielding suboptimal results for complex object movements, the latter, though achieving high-performance, lacks interpretability and involves complex module designs. This work aims to simplify deep learning-based spatial-temporal relationship models and introduce interpretability into features for data association. Specifically, a lightweight single-layer transformer encoder is utilised to model spatial-temporal relationships. To make features more interpretative, two contrastive regularisation losses based on representation alignment are proposed, derived from spatial-temporal consistency rules. By applying weighted summation to affinity matrices, the aligned features can seamlessly integrate into the data association stage of the original tracking workflow. Experimental results showcase that our model enhances the majority of existing tracking networks' performance without excessive complexity, with minimal increase in training overhead and nearly negligible computational and storage costs.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12331","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiangyan Dai, Huihui Zhang, Jin Gao, Chunlei Chen, Yugen Yi
{"title":"Hybrid feature-based moving cast shadow detection","authors":"Jiangyan Dai, Huihui Zhang, Jin Gao, Chunlei Chen, Yugen Yi","doi":"10.1049/cvi2.12328","DOIUrl":"10.1049/cvi2.12328","url":null,"abstract":"<p>The accurate detection of moving objects is essential in various applications of artificial intelligence, particularly in the field of intelligent surveillance systems. However, the moving cast shadow detection significantly decreases the precision of moving object detection because they share similar motion characteristics. To address the issue, the authors propose an innovative approach to detect moving cast shadows by combining the hybrid feature with a broad learning system (BLS). The approach involves extracting low-level features from the input and background images based on colour constancy and texture consistency principles that are shown to be highly effective in moving cast shadow detection. The authors then utilise the BLS to create a hybrid feature and BLS uses the extracted low-level features as input instead of the original data. BLS is an innovative form of deep learning that can map input to feature nodes and further enhance them by enhancement nodes, resulting in more compact features for classification. Finally, the authors develop an efficient and straightforward post-processing technique to improve the accuracy of moving object detection. To evaluate the effectiveness and generalisation ability, the authors conduct extensive experiments on public ATON-CVRR and CDnet datasets to verify the superior performance of our method by comparing with representative approaches.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12328","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Li, Wenxuan Yang, Chuanlun Zhang, Heng Li, Xinjia Li, Lin Wang, Yanling Wang, Xiaoyan Wang
{"title":"High precision light field image depth estimation via multi-region attention enhanced network","authors":"Jie Li, Wenxuan Yang, Chuanlun Zhang, Heng Li, Xinjia Li, Lin Wang, Yanling Wang, Xiaoyan Wang","doi":"10.1049/cvi2.12326","DOIUrl":"10.1049/cvi2.12326","url":null,"abstract":"<p>Light field (LF) depth estimation is a key task with numerous practical applications. However, achieving high-precision depth estimation in challenging scenarios, such as occlusions and detailed regions (e.g. fine structures and edges), remains a significant challenge. To address this problem, the authors propose a LF depth estimation network based on multi-region selection and guided optimisation. Firstly, we construct a multi-region disparity selection module based on angular patch, which selects specific regions for generating angular patch, achieving representative sub-angular patch by balancing different regions. Secondly, different from traditional guided deformable convolution, the guided optimisation leverages colour prior information to learn the aggregation of sampling points, which enhances the deformable convolution ability by learning deformation parameters and fitting irregular windows. Finally, to achieve high-precision LF depth estimation, the authors have developed a network architecture based on the proposed multi-region disparity selection and guided optimisation module. Experiments demonstrate the effectiveness of network on the HCInew dataset, especially in handling occlusions and detailed regions.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1390-1406"},"PeriodicalIF":1.3,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12326","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DPANet: Position-aware feature encoding and decoding for accurate large-scale point cloud semantic segmentation","authors":"Haoying Zhao, Aimin Zhou","doi":"10.1049/cvi2.12325","DOIUrl":"10.1049/cvi2.12325","url":null,"abstract":"<p>Due to the scattered, unordered, and unstructured nature of point clouds, it is challenging to extract local features. Existing methods tend to design redundant and less-discriminative spatial feature extraction methods in the encoder, while neglecting the utilisation of uneven distribution in the decoder. In this paper, the authors fully exploit the characteristics of the imbalanced distribution in point clouds and design our Position-aware Encoder (PAE) module and Position-aware Decoder (PAD) module. In the PAE module, the authors extract position relationships utilising both Cartesian coordinate system and polar coordinate system to enhance the distinction of features. In the PAD module, the authors recognise the inherent positional disparities between each point and its corresponding upsampled point, utilising these distinctions to enrich features and mitigate information loss. The authors conduct extensive experiments and compare the proposed DPANet with existing methods on two benchmarks S3DIS and Semantic3D. The experimental results demonstrate that the method outperforms the state-of-the-art approaches.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1376-1389"},"PeriodicalIF":1.3,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12325","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanyu Wei, Xinsha Fu, Siqi Ma, Yaqiao Zhu, Ning Lu
{"title":"Reducing overfitting in vehicle recognition by decorrelated sparse representation regularisation","authors":"Wanyu Wei, Xinsha Fu, Siqi Ma, Yaqiao Zhu, Ning Lu","doi":"10.1049/cvi2.12320","DOIUrl":"10.1049/cvi2.12320","url":null,"abstract":"<p>Most state-of-the-art vehicle recognition methods benefit from the excellent feature extraction capabilities of convolutional neural networks (CNNs), which allow the models to perform well on the intra-dataset. However, they often show poor generalisation when facing cross-datasets due to the overfitting problem. For this issue, numerous studies have shown that models do not generalise well in new scenarios due to the high correlation between the representations in CNNs. Furthermore, over-parameterised CNNs have a large number of redundant representations. Therefore, we propose a novel Decorrelated Sparse Representation (DSR) regularisation. (1) It tries to minimise the correlation between feature maps to obtain decorrelated representations. (2) It forces the convolution kernels to extract meaningful features by allowing the sparse kernels to have additional optimisation. The DSR regularisation encourages diverse representations to reduce overfitting. Meanwhile, DSR can be applied to a wide range of vehicle recognition methods based on CNNs, and it does not require additional computation in the testing phase. In the experiments, DSR performs better than the original model on the intra-dataset and cross-dataset. Through ablation analysis, we find that DSR can drive the model to focus on the essential differences among all kinds of vehicles.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1351-1361"},"PeriodicalIF":1.3,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12320","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RGAM: A refined global attention mechanism for medical image segmentation","authors":"Gangjun Ning, Pingping Liu, Chuangye Dai, Mingsi Sun, Qiuzhan Zhou, Qingliang Li","doi":"10.1049/cvi2.12323","DOIUrl":"10.1049/cvi2.12323","url":null,"abstract":"<p>Attention mechanisms are popular techniques in computer vision that mimic the ability of the human visual system to analyse complex scenes, enhancing the performance of convolutional neural networks (CNN). In this paper, the authors propose a refined global attention module (RGAM) to address known shortcomings of existing attention mechanisms: (1) Traditional channel attention mechanisms are not refined enough when concentrating features, which may lead to overlooking important information. (2) The 1-dimensional attention map generated by traditional spatial attention mechanisms make it difficult to accurately summarise the weights of all channels in the original feature map at the same position. The RGAM is composed of two parts: refined channel attention and refined spatial attention. In the channel attention part, the authors used multiple weight-shared dilated convolutions with varying dilation rates to perceive features with different receptive fields at the feature compression stage. The authors also combined dilated convolutions with depth-wise convolution to reduce the number of parameters. In the spatial attention part, the authors grouped the feature maps and calculated the attention for each group independently, allowing for a more accurate assessment of each spatial position’s importance. Specifically, the authors calculated the attention weights separately for the width and height directions, similar to SENet, to obtain more refined attention weights. To validate the effectiveness and generality of the proposed method, the authors conducted extensive experiments on four distinct medical image segmentation datasets. The results demonstrate the effectiveness of RGAM in achieving state-of-the-art performance compared to existing methods.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1362-1375"},"PeriodicalIF":1.3,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12323","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pablo Alonso, Jon Ander Íñiguez de Gordoa, Juan Diego Ortega, Marcos Nieto
{"title":"Efficient class-agnostic obstacle detection for UAV-assisted waterway inspection systems","authors":"Pablo Alonso, Jon Ander Íñiguez de Gordoa, Juan Diego Ortega, Marcos Nieto","doi":"10.1049/cvi2.12319","DOIUrl":"10.1049/cvi2.12319","url":null,"abstract":"<p>Ensuring the safety of water airport runways is essential for the correct operation of seaplane flights. Among other tasks, airport operators must identify and remove various objects that may have drifted into the runway area. In this paper, the authors propose a complete and embedded-friendly waterway obstacle detection pipeline that runs on a camera-equipped drone. This system uses a class-agnostic version of the YOLOv7 detector, which is capable of detecting objects regardless of its class. Additionally, through the usage of the GPS data of the drone and camera parameters, the location of the objects are pinpointed with 0.58 m Distance Root Mean Square. In our own annotated dataset, the system is capable of generating alerts for detected objects with a recall of 0.833 and a precision of 1.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1087-1096"},"PeriodicalIF":1.3,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12319","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomer Gadot, Ștefan Istrate, Hyungwon Kim, Dan Morris, Sara Beery, Tanya Birch, Jorge Ahumada
{"title":"To crop or not to crop: Comparing whole-image and cropped classification on a large dataset of camera trap images","authors":"Tomer Gadot, Ștefan Istrate, Hyungwon Kim, Dan Morris, Sara Beery, Tanya Birch, Jorge Ahumada","doi":"10.1049/cvi2.12318","DOIUrl":"10.1049/cvi2.12318","url":null,"abstract":"<p>Camera traps facilitate non-invasive wildlife monitoring, but their widespread adoption has created a data processing bottleneck: a camera trap survey can create millions of images, and the labour required to review those images strains the resources of conservation organisations. AI is a promising approach for accelerating image review, but AI tools for camera trap data are imperfect; in particular, classifying small animals remains difficult, and accuracy falls off outside the ecosystems in which a model was trained. It has been proposed that incorporating an object detector into an image analysis pipeline may help address these challenges, but the benefit of object detection has not been systematically evaluated in the literature. In this work, the authors assess the hypothesis that classifying animals cropped from camera trap images using a species-agnostic detector yields better accuracy than classifying whole images. We find that incorporating an object detection stage into an image classification pipeline yields a macro-average F1 improvement of around 25% on a large, long-tailed dataset; this improvement is reproducible on a large public dataset and a smaller public benchmark dataset. The authors describe a classification architecture that performs well for both whole and detector-cropped images, and demonstrate that this architecture yields state-of-the-art benchmark accuracy.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1193-1208"},"PeriodicalIF":1.3,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12318","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}