Pattern Recognition最新文献

筛选
英文 中文
Jointly stochastic fully symmetric interpolatory rules and local approximation for scalable Gaussian process regression 用于可扩展高斯过程回归的联合随机全对称插值规则和局部近似法
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-31 DOI: 10.1016/j.patcog.2024.111125
{"title":"Jointly stochastic fully symmetric interpolatory rules and local approximation for scalable Gaussian process regression","authors":"","doi":"10.1016/j.patcog.2024.111125","DOIUrl":"10.1016/j.patcog.2024.111125","url":null,"abstract":"<div><div>When exploring the broad application prospects of large-scale Gaussian process regression (GPR), three core challenges significantly constrain its full effectiveness: firstly, the <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>3</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> time complexity of computing the inverse covariance matrix of <span><math><mi>n</mi></math></span> training points becomes an insurmountable performance bottleneck when processing large-scale datasets; Secondly, although traditional local approximation methods are widely used, they are often limited by the inconsistency of prediction results; The third issue is that many aggregation strategies lack discrimination when evaluating the importance of experts (i.e. local models), resulting in a loss of overall prediction accuracy. In response to the above challenges, this article innovatively proposes a comprehensive method that integrates third-degree stochastic fully symmetric interpolatory rules (TDSFSI), local approximation, and Tsallis mutual information (TDSFSIRLA), aiming to fundamentally break through existing limitations. Specifically, TDSFSIRLA first introduces an efficient third-degree stochastic fully symmetric interpolatory rules, which achieves accurate approximation of Gaussian kernel functions by generating adaptive dimensional feature maps. This innovation not only significantly reduces the number of required orthogonal nodes and effectively lowers computational costs, but also maintains extremely high approximation accuracy, providing a solid theoretical foundation for processing large-scale datasets. Furthermore, in order to overcome the inconsistency of local approximation methods, this paper adopts the Generalized Robust Bayesian Committee Machine (GRBCM) as the aggregation framework for local experts. GRBCM ensures the harmonious unity of the prediction results of each local model through its inherent consistency and robustness, significantly improving the stability and reliability of the overall prediction. More importantly, in response to the issue of uneven distribution of expert weights, this article creatively introduces Tsallis mutual information as a metric for weight allocation. Tsallis mutual information, with its sensitive ability to capture information complexity, assigns weights to different local experts that match their contribution, effectively solving the problem of prediction bias caused by uneven weight distribution and further improving prediction accuracy. In the experimental verification phase, this article conducted comprehensive testing on multiple synthetic datasets and seven representative real datasets. The results show that the TDSFSIRLA method not only achieves significant reduction in time complexity, but also demonstrates excellent performance in prediction accuracy, fully verifying its significant advantages and broad application prospects in the field of large-scale Gaussi","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Apply prior feature integration to sparse object detectors 将先验特征整合应用于稀疏物体检测器
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-31 DOI: 10.1016/j.patcog.2024.111103
{"title":"Apply prior feature integration to sparse object detectors","authors":"","doi":"10.1016/j.patcog.2024.111103","DOIUrl":"10.1016/j.patcog.2024.111103","url":null,"abstract":"<div><div>Noisy boxes as queries for sparse object detection has become a hot topic of research in recent years. Sparse R-CNN achieves one-to-one prediction from noisy boxes to object boxes, while DiffusionDet transforms the prediction process of Sparse R-CNN into multiple diffusion processes. Especially, algorithms such as Sparse R-CNN and its improved versions all rely on FPN to extract features for ROI Aligning. But the target only matching one feature map in FPN, which is inefficient and resource-consuming. otherwise, these methods like sparse object detection crop regions from noisy boxes for prediction, resulting in boxes failing to capture global features. In this work, we rethink the detection paradigm of sparse object detection and propose two improvements and produce a new object detector, called Prior Sparse R-CNN. Firstly, we replace the original FPN neck with a neck that only outputs one feature map to improve efficiency. Then, we design aggregated encoder after neck to solve the object scale problem through dilated residual blocks and feature aggregation. Another improvement is that we introduce prior knowledge for noisy boxes to enhance their understanding of global representations. Region Generation network (RGN) is designed by us to generate global object information and fuse it with the features of noisy boxes as prior knowledge. Prior Sparse R-CNN reaches the state-of-the-art 47.0 AP on COCO 2017 validation set, surpassing DiffusionDet by 1.5 AP with ResNet-50 backbone. Additionally, our training epoch requires only 3/5 of the time.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition 用于基于骨骼的动作识别的局部和全局自注意力增强型图卷积网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-31 DOI: 10.1016/j.patcog.2024.111106
{"title":"Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition","authors":"","doi":"10.1016/j.patcog.2024.111106","DOIUrl":"10.1016/j.patcog.2024.111106","url":null,"abstract":"<div><div>The current successful paradigm for skeleton-based action recognition is the combination of Graph Convolutional Networks (GCNs) modeling spatial correlations, and Temporal Convolution Networks (TCNs), extracting motion features. Such GCN-TCN-based approaches usually rely on local graph convolution operations, which limits their ability to capture complicated correlations among distant joints, as well as represent long-range dependencies. Although the self-attention originated from Transformers shows great potential in correlation modeling of global joints, the Transformer-based methods are usually computationally expensive and ignore the physical connectivity structure of the human skeleton. To address these issues, we propose a novel Local-Global Self-Attention Enhanced Graph Convolutional Network (LG-SGNet) to simultaneously learn both local and global representations in the spatial–temporal dimension. Our approach consists of three components: The Local-Global Graph Convolutional Network (LG-GCN) module extracts local and global spatial feature representations by parallel channel-specific global and local spatial modeling. The Local-Global Temporal Convolutional Network (LG-TCN) module performs a joint-wise global temporal modeling using multi-head self-attention in parallel with local temporal modeling. This constitutes a new multi-branch temporal convolution structure that effectively captures both long-range dependencies and subtle temporal structures. Finally, the Dynamic Frame Weighting Module (DFWM) adjusts the weights of skeleton action sequence frames, allowing the model to adaptively focus on the features of representative frames for more efficient action recognition. Extensive experiments demonstrate that our LG-SGNet performs very competitively compared to the state-of-the-art methods. Our project website is available at <span><span>https://github.com/DingYyue/LG-SGNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainability-based knowledge distillation 基于可解释性的知识提炼
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-30 DOI: 10.1016/j.patcog.2024.111095
{"title":"Explainability-based knowledge distillation","authors":"","doi":"10.1016/j.patcog.2024.111095","DOIUrl":"10.1016/j.patcog.2024.111095","url":null,"abstract":"<div><div>Knowledge distillation (KD) is a popular approach for deep model acceleration. Based on the knowledge distilled, we categorize KD methods as label-related and structure-related. The former distills the very abstract (high-level) knowledge, e.g., logits; and the latter uses the spatial (low- or medium-level feature) knowledge. However, existing KD methods are usually not explainable, i.e., we do not know what knowledge is transferred during distillation. In this work, we propose a new KD method, Explainability-based Knowledge Distillation (Exp-KD). Specifically, we propose to use class activation map (CAM) as the explainable knowledge which can effectively capture both label- and structure-related information during the distillation. We conduct extensive experiments, including image classification tasks on CIFAR-10, CIFAR-100 and ImageNet datasets, and explainability tests on ImageNet and ImageNet-Segmentation. The results show the great effectiveness and explainability of Exp-KD compared with the state-of-the-art. Code is available at <span><span>https://github.com/Blenderama/Exp-KD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-task OCTA image segmentation with innovative dimension compression 利用创新维度压缩技术进行多任务 OCTA 图像分割
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-30 DOI: 10.1016/j.patcog.2024.111123
{"title":"Multi-task OCTA image segmentation with innovative dimension compression","authors":"","doi":"10.1016/j.patcog.2024.111123","DOIUrl":"10.1016/j.patcog.2024.111123","url":null,"abstract":"<div><div>Optical Coherence Tomography Angiography (OCTA) plays a crucial role in the early detection and continuous monitoring of ocular diseases, which relies on accurate multi-tissue segmentation of retinal images. Existing OCTA segmentation methods typically focus on single-task designs that do not fully utilize the information of volume data in these images. To bridge this gap, our study introduces H2C-Net, a novel network architecture engineered for simultaneous and precise segmentation of various retinal structures, including capillaries, arteries, veins, and the fovea avascular zone (FAZ). At its core, H2C-Net consists of a plug-and-play Height-Channel Module (H2C) and an Enhanced U-shaped Network (GPC-Net). The H2C module cleverly converts the height information of the OCTA volume data into channel information through the Squeeze operation, realizes the lossless dimensionality reduction from 3D to 2D, and provides the \"Soft layering\" information by unidirectional pooling. Meanwhile, in order to guide the network to focus on channels for training, U-Net is enhanced with group normalization, channel attention mechanism, and Parametric Rectified Linear Unit (PReLU), which reduces the dependence on batch size and enhances the network's ability to extract salient features. Extensive experiments on two subsets of the publicly available OCTA-500 dataset have shown that H2C-Net outperforms existing state-of-the-art methods. It achieves average Intersection over Union (IoU) scores of 82.84 % and 88.48 %, marking improvements of 0.81 % and 1.59 %, respectively. Similarly, the average Dice scores are elevated to 90.40 % and 93.76 %, exceeding previous benchmarks by 0.42 % and 0.94 %. The proposed H2C-Net exhibits excellent performance in OCTA image segmentation, providing an efficient and accurate multi-task segmentation solution in ophthalmic diagnostics. The code is publicly available at: <span><span>https://github.com/IAAI-SIT/H2C-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-modal independent matching network for image-text retrieval 用于图像文本检索的跨模态独立匹配网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-29 DOI: 10.1016/j.patcog.2024.111096
{"title":"Cross-modal independent matching network for image-text retrieval","authors":"","doi":"10.1016/j.patcog.2024.111096","DOIUrl":"10.1016/j.patcog.2024.111096","url":null,"abstract":"<div><div>Image-text retrieval serves as a bridge connecting vision and language. Mainstream modal cross matching methods can effectively perform cross-modal interactions with high theoretical performance. However, there is a deficiency in efficiency. Modal independent matching methods exhibit superior efficiency but lack in performance. Therefore, achieving a balance between matching efficiency and performance becomes a challenge in the field of image-text retrieval. In this paper, we propose a new Cross-modal Independent Matching Network (CIMN) for image-text retrieval. Specifically, we first use the proposed Feature Relationship Reasoning (FRR) to infer neighborhood and potential relations of modal features. Then, we introduce Graph Pooling (GP) based on graph convolutional networks to perform modal global semantic aggregation. Finally, we introduce the Gravitation Loss (GL) by incorporating sample mass into the learning process. This loss can correct the matching relationship between and within each modality, avoiding the problem of equal treatment of all samples in the traditional triplet loss. Extensive experiments on Flickr30K and MSCOCO datasets demonstrate the superiority of the proposed method. It achieves a good balance between matching efficiency and performance, surpasses other similar independent matching methods in performance, and can obtain retrieval accuracy comparable to some mainstream cross matching methods with an order of magnitude lower inference time.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fully exploring object relation interaction and hidden state attention for video captioning 充分探索视频字幕的对象关系互动和隐藏状态关注
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-28 DOI: 10.1016/j.patcog.2024.111138
{"title":"Fully exploring object relation interaction and hidden state attention for video captioning","authors":"","doi":"10.1016/j.patcog.2024.111138","DOIUrl":"10.1016/j.patcog.2024.111138","url":null,"abstract":"<div><div>Video Captioning (VC) is a challenging task of automatically generating natural language sentences for describing video contents. As a video often contains multiple objects, it is comprehensively crucial to identify multiple objects and model relationships between them. Previous models usually adopt Graph Convolutional Networks (GCN) to infer relational information via object nodes, but there exist uncertainty and over-smoothing issues of relational reasoning. To tackle these issues, we propose a Knowledge Graph based Video Captioning Network (KG-VCN) by fully exploring object relation interaction, hidden state and attention enhancement. In encoding stages, we present a Graph and Convolution Hybrid Encoder (GCHE), which uses an object detector to find visual objects with bounding boxes for Knowledge Graph (KG) and Convolutional Neural Network (CNN). To model intrinsic relations between detected objects, we propose a knowledge graph based Object Relation Graph Interaction (ORGI) module. In ORGI, we design triplets (<em>head, relation, tail</em>) to efficiently mine object relations, and create a global node to enable adequate information flow among all graph nodes for avoiding possibly missed relations. To produce accurate and rich captions, we propose a hidden State and Attention Enhanced Decoder (SAED) by integrating hidden states and dynamically updated attention features. Our SAED accepts both relational and visual features, adopts Long Short-Term Memory (LSTM) to produce hidden states, and dynamically update attention features. Unlike existing methods, we concatenate state and attention features to predict next word sequentially. To demonstrate the effectiveness of our model, we conduct experiments on three well-known datasets (MSVD, MSR-VTT, VaTeX), and our model achieves impressive results significantly outperforming existing state-of-the-art models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A newton interpolation network for smoke semantic segmentation 用于烟雾语义分割的牛顿插值网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-28 DOI: 10.1016/j.patcog.2024.111119
{"title":"A newton interpolation network for smoke semantic segmentation","authors":"","doi":"10.1016/j.patcog.2024.111119","DOIUrl":"10.1016/j.patcog.2024.111119","url":null,"abstract":"<div><div>Smoke has large variances of visual appearances that are very adverse to visual segmentation. Furthermore, its semi-transparency often produces highly complicated mixtures of smoke and backgrounds. These factors lead to great difficulties in labelling and segmenting smoke regions. To improve accuracy of smoke segmentation, we propose a Newton Interpolation Network (NINet) for visual smoke semantic segmentation. Unlike simply concatenating or point-wisely adding multi-scale encoded feature maps for information fusion or re-usage, we design a Newton Interpolation Module (NIM) to extract structured information by analyzing the feature values in the same position but from encoded feature maps with different scales. Interpolated features by our NIM contain long-range dependency and semantic structures across different levels, but traditional fusion of multi-scale feature maps cannot model intrinsic structures embedded in these maps. To obtain multi-scale structured information, we repeatedly use the proposed NIM at different levels of the decoding stages. In addition, we use more encoded feature maps to construct a higher order Newton interpolation polynomial for extracting higher order information. Extensive experiments validate that our method significantly outperforms existing state-of-the-art algorithms on virtual and real smoke datasets, and ablation experiments also validate the effectiveness of our NIMs.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring sample relationship for few-shot classification 探索少镜头分类的样本关系
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-28 DOI: 10.1016/j.patcog.2024.111089
{"title":"Exploring sample relationship for few-shot classification","authors":"","doi":"10.1016/j.patcog.2024.111089","DOIUrl":"10.1016/j.patcog.2024.111089","url":null,"abstract":"<div><div>Few-shot classification (FSC) is a challenging problem, which aims to identify novel classes with limited samples. Most existing methods employ vanilla transfer learning or episodic meta-training to learn a feature extractor, and then measure the similarity between the query image and the few support examples of novel classes. However, these approaches merely learn feature representations from individual images, overlooking the exploration of the interrelationships among images. This neglect can hinder the attainment of more discriminative feature representations, thus limiting the potential improvement of few-shot classification performance. To address this issue, we propose a Sample Relationship Exploration (SRE) module comprising the Sample-level Attention (SA), Explicit Guidance (EG) and Channel-wise Adaptive Fusion (CAF) components, to learn discriminative category-related features. Specifically, we first employ the SA component to explore the similarity relationships among samples and obtain aggregated features of similar samples. Furthermore, to enhance the robustness of these features, we introduce the EG component to explicitly guide the learning of sample relationships by providing an ideal affinity map among samples. Finally, the CAF component is adopted to perform weighted fusion of the original features and the aggregated features, yielding category-related embeddings. The proposed method is a plug-and-play module which can be embedded into both transfer learning and meta-learning based few-shot classification frameworks. Extensive experiments on benchmark datasets show that the proposed module can effectively improve the performance over baseline models, and also perform competitively against the state-of-the-art algorithms. The source code is available at <span><span>https://github.com/Chenguoz/SRE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SeaTrack: Rethinking Observation-Centric SORT for Robust Nearshore Multiple Object Tracking SeaTrack:重新思考以观测为中心的 SORT,实现稳健的近岸多目标跟踪
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-28 DOI: 10.1016/j.patcog.2024.111091
{"title":"SeaTrack: Rethinking Observation-Centric SORT for Robust Nearshore Multiple Object Tracking","authors":"","doi":"10.1016/j.patcog.2024.111091","DOIUrl":"10.1016/j.patcog.2024.111091","url":null,"abstract":"<div><div>Nearshore Multiple Object Tracking (NMOT) aims to locate and associate nearshore objects. Current approaches utilize Automatic Identification Systems (AIS) and radar to accomplish this task. However, video signals can describe the visual appearance of nearshore objects without prior information such as identity, location, or motion. In addition, sea clutter will not affect the capture of living objects by visual sensors. Recognizing this, we analyzed three key long-term challenges of the vision-based NMOT and proposed a tracking pipeline that relies solely on motion information. Maritime objects are highly susceptible to being obscured or submerged by waves, resulting in fragmented tracklets. We first introduced guiding modulation to address the long-term occlusion and interaction of maritime objects. Subsequently, we modeled confidence, altitude, and angular momentum to mitigate the effects of motion blur, ringing, and overshoot artifacts to observations in unstable imaging environments. Additionally, we designed a motion fusion mechanism that combines long-term macro tracklets with short-term fine-grained tracklets. This correction mechanism helps reduce the estimation variance of the Kalman Filter (KF) to alleviate the substantial nonlinear motion of maritime objects. We call this pipeline SeaTrack, which remains simple, online, and real-time, demonstrating excellent performance and scalability in benchmark evaluations.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信