NeurocomputingPub Date : 2024-10-28DOI: 10.1016/j.neucom.2024.128763
{"title":"CoFiNet: Unveiling camouflaged objects with multi-scale finesse","authors":"","doi":"10.1016/j.neucom.2024.128763","DOIUrl":"10.1016/j.neucom.2024.128763","url":null,"abstract":"<div><div>Camouflaged Object Detection (COD) is a critical aspect of computer vision aimed at identifying concealed objects, with applications spanning military, industrial, medical and monitoring domains. To address the problem of poor detail segmentation effect, we introduce a novel method for camouflaged object detection, named CoFiNet. Our approach primarily focuses on multi-scale feature fusion and extraction, with special attention to the model’s segmentation effectiveness for detailed features, enhancing its ability to effectively detect camouflaged objects. CoFiNet adopts a coarse-to-fine strategy. A multi-scale feature integration module is laveraged to enhance the model’s capability of fusing context feature. A multi-activation selective kernel module is leveraged to grant the model the ability to autonomously alter its receptive field, enabling it to selectively choose an appropriate receptive field for camouflaged objects of different sizes. During mask generation, we employ the dual-mask strategy for image segmentation, separating the reconstruction of coarse and fine masks, which significantly enhances the model’s learning capacity for details. Comprehensive experiments were conducted on four different datasets, demonstrating that CoFiNet achieves state-of-the-art performance across all datasets. The experiment results of CoFiNet underscore its effectiveness in camouflaged object detection and highlight its potential in various practical application scenarios.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-28DOI: 10.1016/j.neucom.2024.128755
{"title":"Interpretable few-shot learning with online attribute selection","authors":"","doi":"10.1016/j.neucom.2024.128755","DOIUrl":"10.1016/j.neucom.2024.128755","url":null,"abstract":"<div><div>Few-shot learning (FSL) presents a challenging learning problem in which only a few samples are available for each class. Decision interpretation is more important in few-shot classification due to a greater chance of error compared to traditional classification. However, the majority of the previous FSL methods are black-box models. In this paper, we propose an inherently interpretable model for FSL based on human-friendly attributes. Previously, human-friendly attributes have been utilized to train models with the potential for human interaction and interpretability. However, such approaches are not directly extendible to the few-shot classification scenario. Moreover, we propose an online attribute selection mechanism to effectively filter out irrelevant attributes in each episode. The attribute selection mechanism improves accuracy and helps with interpretability by reducing the number of attributes that participate in each episode. We further propose a mechanism that automatically detects the episodes where the pool of available human-friendly attributes is insufficient, and subsequently augments it by engaging some learned unknown attributes. We demonstrate that the proposed method achieves results on par with black-box few-shot learning models on four widely used datasets. We also empirically evaluate the level of decision alignment between different models and human understanding and show that our model outperforms the comparison methods based on this criterion.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-28DOI: 10.1016/j.neucom.2024.128796
{"title":"Adversarial diffusion for few-shot scene adaptive video anomaly detection","authors":"","doi":"10.1016/j.neucom.2024.128796","DOIUrl":"10.1016/j.neucom.2024.128796","url":null,"abstract":"<div><div>Few-shot anomaly detection for video surveillance is challenging due to the diverse nature of target domains. Existing methodologies treat it as a one-class classification problem, training on a reduced sample of nominal scenes. The focus is on either reconstructive or predictive frame methodologies to learn a manifold against which outliers can be detected during inference. We posit that the quality of image reconstruction or future frame prediction is inherently important in identifying anomalous pixels in video frames. In this paper, we enhance the image synthesis and mode coverage for video anomaly detection (VAD) by integrating a <em>Denoising Diffusion</em> model with a future frame prediction model. Our novel VAD pipeline includes a <em>Generative Adversarial Network</em> combined with denoising diffusion to learn the underlying non-anomalous data distribution and generate in one-step high fidelity future-frame samples. We further regularize the image reconstruction with perceptual quality metrics such as <em>Multi-scale Structural Similarity Index Measure</em> and <em>Peak Signal-to-Noise Ratio</em>, ensuring high-quality output under few episodic training iterations. Extensive experiments demonstrate that our method outperforms state-of-the-art techniques across multiple benchmarks, validating that high-quality image synthesis in frame prediction leads to robust anomaly detection in videos.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-28DOI: 10.1016/j.neucom.2024.128788
{"title":"Physically-guided open vocabulary segmentation with weighted patched alignment loss","authors":"","doi":"10.1016/j.neucom.2024.128788","DOIUrl":"10.1016/j.neucom.2024.128788","url":null,"abstract":"<div><div>Open vocabulary segmentation is a challenging task that aims to segment out the thousands of unseen categories. Directly applying CLIP to open-vocabulary semantic segmentation is challenging due to the granularity gap between its image-level contrastive learning and the pixel-level recognition required for segmentation. To address these challenges, we propose a unified pipeline that leverages physical structure regularization to enhance the generalizability and robustness of open vocabulary segmentation. By incorporating physical structure information, which is independent of the training data, we aim to reduce bias and improve the model’s performance on unseen classes. We utilize low-level structures such as edges and keypoints as regularization terms, as they are easier to obtain and strongly correlated with segmentation boundary information. These structures are used as pseudo-ground truth to supervise the model. Furthermore, inspired by the effectiveness of comparative learning in human cognition, we introduce the weighted patched alignment loss. This loss function contrasts similar and dissimilar samples to acquire low-dimensional representations that capture the distinctions between different object classes. By incorporating physical knowledge and leveraging weighted patched alignment loss, we aim to improve the model’s generalizability, robustness, and capability to recognize diverse object classes. The experiments on the COCO Stuff, Pascal VOC, Pascal Context-59, Pascal Context-459, ADE20K-150, and ADE20K-847 datasets demonstrate that our proposed method consistently improves baselines and achieves new state-of-the-art in the open vocabulary segmentation task.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-28DOI: 10.1016/j.neucom.2024.128789
{"title":"Adaptive feature alignment network with noise suppression for cross-domain object detection","authors":"","doi":"10.1016/j.neucom.2024.128789","DOIUrl":"10.1016/j.neucom.2024.128789","url":null,"abstract":"<div><div>Recently, unsupervised domain adaptive object detection methods have been proposed to address the challenge of detecting objects across different domains without labeled data in the target domain. These methods focus on aligning features either at the image level or the instance level. However, due to the absence of annotations in the target domain, existing approaches encounter challenges such as background noise at the image level and prototype aggregation noise at the instance level. To tackle these issues, we introduce a novel adaptive feature alignment network for cross-domain object detection, comprising two key modules. Firstly, we present an adaptive foreground-aware attention module equipped with a set of learnable part prototypes for image-level alignment. This module dynamically generates foreground attention maps, enabling the model to prioritize foreground features, thus reducing the impact of background noise. Secondly, we propose a class-aware prototype alignment module incorporating an optimal transport algorithm for instance-level alignment. This module mitigates the adverse effects of region–prototype aggregation noise by aligning prototypes with instances based on their semantic similarities. By integrating these two modules, our approach achieves better image-level and instance-level feature alignment. Extensive experiments across three challenging scenarios demonstrate the effectiveness of our method, outperforming state-of-the-art approaches in terms of object detection performance.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-28DOI: 10.1016/j.neucom.2024.128772
{"title":"Active self-semi-supervised learning for few labeled samples","authors":"","doi":"10.1016/j.neucom.2024.128772","DOIUrl":"10.1016/j.neucom.2024.128772","url":null,"abstract":"<div><div>Training deep models with limited annotations poses a significant challenge when applied to diverse practical domains. Employing semi-supervised learning alongside the self-supervised model offers the potential to enhance label efficiency. However, this approach faces a bottleneck in reducing the need for labels. We observed that the semi-supervised model disrupts valuable information from self-supervised learning when only limited labels are available. To address this issue, this paper proposes a simple yet effective framework, active self-semi-supervised learning (AS3L). AS3L bootstraps semi-supervised models with prior pseudo-labels (PPL). These PPLs are obtained by label propagation over self-supervised features. Based on the observations the accuracy of PPL is not only affected by the quality of features but also by the selection of the labeled samples. We develop active learning and label propagation strategies to obtain accurate PPL. Consequently, our framework can significantly improve the performance of models in the case of limited annotations while demonstrating fast convergence. On the image classification tasks across four datasets, our method outperforms the baseline by an average of 5.4%. Additionally, it achieves the same accuracy as the baseline method in about 1/3 of the training time.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-28DOI: 10.1016/j.neucom.2024.128757
{"title":"Out-of-vocabulary handling and topic quality control strategies in streaming topic models","authors":"","doi":"10.1016/j.neucom.2024.128757","DOIUrl":"10.1016/j.neucom.2024.128757","url":null,"abstract":"<div><div>Topic models have become ubiquitous tools for analyzing streaming data. However, existing streaming topic models suffer from several limitations when applied to real-world data streams. This includes the inability to accommodate evolving vocabularies and control topic quality throughout the streaming process. In this paper, we propose a novel streaming topic modeling approach that dynamically adapts to the changing nature of data streams. Our method leverages Byte-Pair Encoding embedding (BPEmb) to resolve the out-of-vocabulary problem that arises with new words in the stream. Additionally, we introduce a topic change variable that provides fine-grained control over topics’ parameter updates and present a preservation approach to retain high-coherence topics at each time step, helping preserve semantic quality. To further enhance model adaptability, our method allows dynamical adjustment of topic space size as needed. To the best of our knowledge, we are the first to address the expansion of vocabulary and maintain topic quality during the streaming process. Extensive experiments show the superior effectiveness of our method.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-28DOI: 10.1016/j.neucom.2024.128767
{"title":"A survey of deep learning algorithms for colorectal polyp segmentation","authors":"","doi":"10.1016/j.neucom.2024.128767","DOIUrl":"10.1016/j.neucom.2024.128767","url":null,"abstract":"<div><div>Early detecting and removing cancerous colorectal polyps can effectively reduce the risk of colorectal cancer. Computer intelligent segmentation techniques (CIST) can improve the detection rate of polyp by drawing the boundaries of colorectal polyps clearly and completely. Four challenges that encountered in deep learning methods for the task of colorectal polyp segmentation are considered, including the limitations of classical deep learning (DL) algorithms, the impact of data set quantity and quality, the diversity of intrinsic characteristics of lesions and the heterogeneity of images in different center datasets. The improved DL algorithms for intelligent polyp segmentation are detailed along with the key neural network modules being designed to deal with above challenges. In addition, the public and private datasets of colorectal polyp images and videos are summarized, respectively. At the end of this paper, the development trends of polyp segmentation algorithm based on deep learning are discussed.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-28DOI: 10.1016/j.neucom.2024.128780
{"title":"Perceptual metric for face image quality with pixel-level interpretability","authors":"","doi":"10.1016/j.neucom.2024.128780","DOIUrl":"10.1016/j.neucom.2024.128780","url":null,"abstract":"<div><div>This paper tackles the shortcomings of image evaluation metrics in evaluating facial image quality. Conventional metrics do neither accurately reflect the unique attributes of facial images nor correspond with human visual perception. To address these issues, we introduce a novel metric designed specifically for faces, utilizing a learning-based adversarial framework. This framework comprises a generator for simulating face restoration and a discriminator for quality evaluation. Drawing inspiration from facial neuroscience studies, our metric emphasizes the importance of primary facial features, acknowledging that minor changes in the eyes, nose, and mouth can significantly impact perception. Another key limitation of existing image evaluation metrics is their focus on numerical values at the image level, without providing insight into how different areas of the image contribute to the overall assessment. Our proposed metric offers interpretability regarding how each region of the image is evaluated. Comprehensive experimental results confirm that our face-specific metric surpasses traditional general image quality assessment metrics for facial images, including both full-reference and no-reference methods. The code and models are available at <span><span>https://github.com/AIM-SKKU/IFQA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NeurocomputingPub Date : 2024-10-28DOI: 10.1016/j.neucom.2024.128782
{"title":"A pseudo-3D coarse-to-fine architecture for 3D medical landmark detection","authors":"","doi":"10.1016/j.neucom.2024.128782","DOIUrl":"10.1016/j.neucom.2024.128782","url":null,"abstract":"<div><div>The coarse-to-fine architecture is a benchmark method designed to enhance the accuracy of 3D medical landmark detection. However, incorporating 3D convolutional neural networks into the coarse-to-fine architecture leads to a significant increase in model parameters, making it costly for deployment in clinical applications. This paper introduces a novel lightweight pseudo-3D coarse-to-fine architecture, consisting of a Plane-wise Attention Pseudo-3D (PA-P3D) model and a Spatial Separation Pseudo-3D (SS-P3D) model. The PA-P3D inherits the lightweight structure of the general pseudo-3D and enhances cross-plane feature interaction in 3D medical images. On the other hand, the SS-P3D replaces the 3D model with three spatially separated 2D models to simultaneously detect 2D landmarks on axial, sagittal, and coronal planes. In comparison to the conventional coarse-to-fine architecture, the proposed method requires only approximately a quarter of the model parameters (60% reduced by PA-P3D and 40% reduced by SS-P3D) while simultaneously improving landmark detection performance. Experimental results demonstrate the effectiveness of the proposed method, achieving state-of-the-art performance on both a public dataset for mandibular molar landmark detection and a private dataset for cephalometric landmark detection. Overall, this paper highlights the potential of the coarse-to-fine method for cost-effective model deployment, thanks to its lightweight model structure.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}