Pattern Recognition最新文献

筛选
英文 中文
Apply prior feature integration to sparse object detectors 将先验特征整合应用于稀疏物体检测器
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-31 DOI: 10.1016/j.patcog.2024.111103
Yu Qian , Qijin Wang , Changxin Wu , Chao Wang , Long Cheng , Yating Hu , Hongqiang Wang
{"title":"Apply prior feature integration to sparse object detectors","authors":"Yu Qian ,&nbsp;Qijin Wang ,&nbsp;Changxin Wu ,&nbsp;Chao Wang ,&nbsp;Long Cheng ,&nbsp;Yating Hu ,&nbsp;Hongqiang Wang","doi":"10.1016/j.patcog.2024.111103","DOIUrl":"10.1016/j.patcog.2024.111103","url":null,"abstract":"<div><div>Noisy boxes as queries for sparse object detection has become a hot topic of research in recent years. Sparse R-CNN achieves one-to-one prediction from noisy boxes to object boxes, while DiffusionDet transforms the prediction process of Sparse R-CNN into multiple diffusion processes. Especially, algorithms such as Sparse R-CNN and its improved versions all rely on FPN to extract features for ROI Aligning. But the target only matching one feature map in FPN, which is inefficient and resource-consuming. otherwise, these methods like sparse object detection crop regions from noisy boxes for prediction, resulting in boxes failing to capture global features. In this work, we rethink the detection paradigm of sparse object detection and propose two improvements and produce a new object detector, called Prior Sparse R-CNN. Firstly, we replace the original FPN neck with a neck that only outputs one feature map to improve efficiency. Then, we design aggregated encoder after neck to solve the object scale problem through dilated residual blocks and feature aggregation. Another improvement is that we introduce prior knowledge for noisy boxes to enhance their understanding of global representations. Region Generation network (RGN) is designed by us to generate global object information and fuse it with the features of noisy boxes as prior knowledge. Prior Sparse R-CNN reaches the state-of-the-art 47.0 AP on COCO 2017 validation set, surpassing DiffusionDet by 1.5 AP with ResNet-50 backbone. Additionally, our training epoch requires only 3/5 of the time.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111103"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition 用于基于骨骼的动作识别的局部和全局自注意力增强型图卷积网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-31 DOI: 10.1016/j.patcog.2024.111106
Zhize Wu , Yue Ding , Long Wan , Teng Li , Fudong Nian
{"title":"Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition","authors":"Zhize Wu ,&nbsp;Yue Ding ,&nbsp;Long Wan ,&nbsp;Teng Li ,&nbsp;Fudong Nian","doi":"10.1016/j.patcog.2024.111106","DOIUrl":"10.1016/j.patcog.2024.111106","url":null,"abstract":"<div><div>The current successful paradigm for skeleton-based action recognition is the combination of Graph Convolutional Networks (GCNs) modeling spatial correlations, and Temporal Convolution Networks (TCNs), extracting motion features. Such GCN-TCN-based approaches usually rely on local graph convolution operations, which limits their ability to capture complicated correlations among distant joints, as well as represent long-range dependencies. Although the self-attention originated from Transformers shows great potential in correlation modeling of global joints, the Transformer-based methods are usually computationally expensive and ignore the physical connectivity structure of the human skeleton. To address these issues, we propose a novel Local-Global Self-Attention Enhanced Graph Convolutional Network (LG-SGNet) to simultaneously learn both local and global representations in the spatial–temporal dimension. Our approach consists of three components: The Local-Global Graph Convolutional Network (LG-GCN) module extracts local and global spatial feature representations by parallel channel-specific global and local spatial modeling. The Local-Global Temporal Convolutional Network (LG-TCN) module performs a joint-wise global temporal modeling using multi-head self-attention in parallel with local temporal modeling. This constitutes a new multi-branch temporal convolution structure that effectively captures both long-range dependencies and subtle temporal structures. Finally, the Dynamic Frame Weighting Module (DFWM) adjusts the weights of skeleton action sequence frames, allowing the model to adaptively focus on the features of representative frames for more efficient action recognition. Extensive experiments demonstrate that our LG-SGNet performs very competitively compared to the state-of-the-art methods. Our project website is available at <span><span>https://github.com/DingYyue/LG-SGNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111106"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainability-based knowledge distillation 基于可解释性的知识提炼
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-30 DOI: 10.1016/j.patcog.2024.111095
Tianli Sun , Haonan Chen , Guosheng Hu , Cairong Zhao
{"title":"Explainability-based knowledge distillation","authors":"Tianli Sun ,&nbsp;Haonan Chen ,&nbsp;Guosheng Hu ,&nbsp;Cairong Zhao","doi":"10.1016/j.patcog.2024.111095","DOIUrl":"10.1016/j.patcog.2024.111095","url":null,"abstract":"<div><div>Knowledge distillation (KD) is a popular approach for deep model acceleration. Based on the knowledge distilled, we categorize KD methods as label-related and structure-related. The former distills the very abstract (high-level) knowledge, e.g., logits; and the latter uses the spatial (low- or medium-level feature) knowledge. However, existing KD methods are usually not explainable, i.e., we do not know what knowledge is transferred during distillation. In this work, we propose a new KD method, Explainability-based Knowledge Distillation (Exp-KD). Specifically, we propose to use class activation map (CAM) as the explainable knowledge which can effectively capture both label- and structure-related information during the distillation. We conduct extensive experiments, including image classification tasks on CIFAR-10, CIFAR-100 and ImageNet datasets, and explainability tests on ImageNet and ImageNet-Segmentation. The results show the great effectiveness and explainability of Exp-KD compared with the state-of-the-art. Code is available at <span><span>https://github.com/Blenderama/Exp-KD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111095"},"PeriodicalIF":7.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-temporal interactive reasoning model for multi-group activity recognition 多群体活动识别的时空交互推理模型
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-30 DOI: 10.1016/j.patcog.2024.111104
Jianglan Huang , Lindong Li , Linbo Qing , Wang Tang , Pingyu Wang , Li Guo , Yonghong Peng
{"title":"Spatio-temporal interactive reasoning model for multi-group activity recognition","authors":"Jianglan Huang ,&nbsp;Lindong Li ,&nbsp;Linbo Qing ,&nbsp;Wang Tang ,&nbsp;Pingyu Wang ,&nbsp;Li Guo ,&nbsp;Yonghong Peng","doi":"10.1016/j.patcog.2024.111104","DOIUrl":"10.1016/j.patcog.2024.111104","url":null,"abstract":"<div><div>Multi-group activity recognition aims to recognize sub-group activities in multi-person scenes. Existing works explore group-level features by simply using graph neural networks for reasoning about the individual interactions and directly aggregating individual features, which cannot fully mine the interactions between people and between sub-groups, resulting in the loss of useful information for group activity recognition. To address this problem, this paper proposes a Spatio-Temporal Interactive Reasoning Model (STIRM) to better exploit potential spatio-temporal interactions for multi-group activity recognition. In particular, we present an interactive feature extraction strategy to explore correlation features between individuals by analyzing the features of their nearest neighbor. We design a new clustering module that combines the action similarity feature and spatio-temporal trajectory feature to divide people into small groups. In addition, to obtain rich and accurate group-level features, a group interaction reasoning module is constructed to explore the interactions between different small groups and among people in the same group and exclude people who have less impact on group activities according to their importance. Extensive experiments on the Social-CAD, PLPS and JRDB-PAR datasets indicate the superiority of the proposed method over state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111104"},"PeriodicalIF":7.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-task OCTA image segmentation with innovative dimension compression 利用创新维度压缩技术进行多任务 OCTA 图像分割
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-30 DOI: 10.1016/j.patcog.2024.111123
Guogang Cao, Zeyu Peng, Zhilin Zhou, Yan Wu, Yunqing Zhang, Rugang Yan
{"title":"Multi-task OCTA image segmentation with innovative dimension compression","authors":"Guogang Cao,&nbsp;Zeyu Peng,&nbsp;Zhilin Zhou,&nbsp;Yan Wu,&nbsp;Yunqing Zhang,&nbsp;Rugang Yan","doi":"10.1016/j.patcog.2024.111123","DOIUrl":"10.1016/j.patcog.2024.111123","url":null,"abstract":"<div><div>Optical Coherence Tomography Angiography (OCTA) plays a crucial role in the early detection and continuous monitoring of ocular diseases, which relies on accurate multi-tissue segmentation of retinal images. Existing OCTA segmentation methods typically focus on single-task designs that do not fully utilize the information of volume data in these images. To bridge this gap, our study introduces H2C-Net, a novel network architecture engineered for simultaneous and precise segmentation of various retinal structures, including capillaries, arteries, veins, and the fovea avascular zone (FAZ). At its core, H2C-Net consists of a plug-and-play Height-Channel Module (H2C) and an Enhanced U-shaped Network (GPC-Net). The H2C module cleverly converts the height information of the OCTA volume data into channel information through the Squeeze operation, realizes the lossless dimensionality reduction from 3D to 2D, and provides the \"Soft layering\" information by unidirectional pooling. Meanwhile, in order to guide the network to focus on channels for training, U-Net is enhanced with group normalization, channel attention mechanism, and Parametric Rectified Linear Unit (PReLU), which reduces the dependence on batch size and enhances the network's ability to extract salient features. Extensive experiments on two subsets of the publicly available OCTA-500 dataset have shown that H2C-Net outperforms existing state-of-the-art methods. It achieves average Intersection over Union (IoU) scores of 82.84 % and 88.48 %, marking improvements of 0.81 % and 1.59 %, respectively. Similarly, the average Dice scores are elevated to 90.40 % and 93.76 %, exceeding previous benchmarks by 0.42 % and 0.94 %. The proposed H2C-Net exhibits excellent performance in OCTA image segmentation, providing an efficient and accurate multi-task segmentation solution in ophthalmic diagnostics. The code is publicly available at: <span><span>https://github.com/IAAI-SIT/H2C-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111123"},"PeriodicalIF":7.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-modal independent matching network for image-text retrieval 用于图像文本检索的跨模态独立匹配网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-29 DOI: 10.1016/j.patcog.2024.111096
Xiao Ke , Baitao Chen , Xiong Yang , Yuhang Cai , Hao Liu , Wenzhong Guo
{"title":"Cross-modal independent matching network for image-text retrieval","authors":"Xiao Ke ,&nbsp;Baitao Chen ,&nbsp;Xiong Yang ,&nbsp;Yuhang Cai ,&nbsp;Hao Liu ,&nbsp;Wenzhong Guo","doi":"10.1016/j.patcog.2024.111096","DOIUrl":"10.1016/j.patcog.2024.111096","url":null,"abstract":"<div><div>Image-text retrieval serves as a bridge connecting vision and language. Mainstream modal cross matching methods can effectively perform cross-modal interactions with high theoretical performance. However, there is a deficiency in efficiency. Modal independent matching methods exhibit superior efficiency but lack in performance. Therefore, achieving a balance between matching efficiency and performance becomes a challenge in the field of image-text retrieval. In this paper, we propose a new Cross-modal Independent Matching Network (CIMN) for image-text retrieval. Specifically, we first use the proposed Feature Relationship Reasoning (FRR) to infer neighborhood and potential relations of modal features. Then, we introduce Graph Pooling (GP) based on graph convolutional networks to perform modal global semantic aggregation. Finally, we introduce the Gravitation Loss (GL) by incorporating sample mass into the learning process. This loss can correct the matching relationship between and within each modality, avoiding the problem of equal treatment of all samples in the traditional triplet loss. Extensive experiments on Flickr30K and MSCOCO datasets demonstrate the superiority of the proposed method. It achieves a good balance between matching efficiency and performance, surpasses other similar independent matching methods in performance, and can obtain retrieval accuracy comparable to some mainstream cross matching methods with an order of magnitude lower inference time.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111096"},"PeriodicalIF":7.5,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fully exploring object relation interaction and hidden state attention for video captioning 充分探索视频字幕的对象关系互动和隐藏状态关注
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-28 DOI: 10.1016/j.patcog.2024.111138
Feiniu Yuan , Sipei Gu , Xiangfen Zhang , Zhijun Fang
{"title":"Fully exploring object relation interaction and hidden state attention for video captioning","authors":"Feiniu Yuan ,&nbsp;Sipei Gu ,&nbsp;Xiangfen Zhang ,&nbsp;Zhijun Fang","doi":"10.1016/j.patcog.2024.111138","DOIUrl":"10.1016/j.patcog.2024.111138","url":null,"abstract":"<div><div>Video Captioning (VC) is a challenging task of automatically generating natural language sentences for describing video contents. As a video often contains multiple objects, it is comprehensively crucial to identify multiple objects and model relationships between them. Previous models usually adopt Graph Convolutional Networks (GCN) to infer relational information via object nodes, but there exist uncertainty and over-smoothing issues of relational reasoning. To tackle these issues, we propose a Knowledge Graph based Video Captioning Network (KG-VCN) by fully exploring object relation interaction, hidden state and attention enhancement. In encoding stages, we present a Graph and Convolution Hybrid Encoder (GCHE), which uses an object detector to find visual objects with bounding boxes for Knowledge Graph (KG) and Convolutional Neural Network (CNN). To model intrinsic relations between detected objects, we propose a knowledge graph based Object Relation Graph Interaction (ORGI) module. In ORGI, we design triplets (<em>head, relation, tail</em>) to efficiently mine object relations, and create a global node to enable adequate information flow among all graph nodes for avoiding possibly missed relations. To produce accurate and rich captions, we propose a hidden State and Attention Enhanced Decoder (SAED) by integrating hidden states and dynamically updated attention features. Our SAED accepts both relational and visual features, adopts Long Short-Term Memory (LSTM) to produce hidden states, and dynamically update attention features. Unlike existing methods, we concatenate state and attention features to predict next word sequentially. To demonstrate the effectiveness of our model, we conduct experiments on three well-known datasets (MSVD, MSR-VTT, VaTeX), and our model achieves impressive results significantly outperforming existing state-of-the-art models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111138"},"PeriodicalIF":7.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A newton interpolation network for smoke semantic segmentation 用于烟雾语义分割的牛顿插值网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-28 DOI: 10.1016/j.patcog.2024.111119
Feiniu Yuan , Guiqian Wang , Qinghua Huang , Xuelong Li
{"title":"A newton interpolation network for smoke semantic segmentation","authors":"Feiniu Yuan ,&nbsp;Guiqian Wang ,&nbsp;Qinghua Huang ,&nbsp;Xuelong Li","doi":"10.1016/j.patcog.2024.111119","DOIUrl":"10.1016/j.patcog.2024.111119","url":null,"abstract":"<div><div>Smoke has large variances of visual appearances that are very adverse to visual segmentation. Furthermore, its semi-transparency often produces highly complicated mixtures of smoke and backgrounds. These factors lead to great difficulties in labelling and segmenting smoke regions. To improve accuracy of smoke segmentation, we propose a Newton Interpolation Network (NINet) for visual smoke semantic segmentation. Unlike simply concatenating or point-wisely adding multi-scale encoded feature maps for information fusion or re-usage, we design a Newton Interpolation Module (NIM) to extract structured information by analyzing the feature values in the same position but from encoded feature maps with different scales. Interpolated features by our NIM contain long-range dependency and semantic structures across different levels, but traditional fusion of multi-scale feature maps cannot model intrinsic structures embedded in these maps. To obtain multi-scale structured information, we repeatedly use the proposed NIM at different levels of the decoding stages. In addition, we use more encoded feature maps to construct a higher order Newton interpolation polynomial for extracting higher order information. Extensive experiments validate that our method significantly outperforms existing state-of-the-art algorithms on virtual and real smoke datasets, and ablation experiments also validate the effectiveness of our NIMs.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111119"},"PeriodicalIF":7.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring sample relationship for few-shot classification 探索少镜头分类的样本关系
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-28 DOI: 10.1016/j.patcog.2024.111089
Xingye Chen , Wenxiao Wu , Li Ma , Xinge You , Changxin Gao , Nong Sang , Yuanjie Shao
{"title":"Exploring sample relationship for few-shot classification","authors":"Xingye Chen ,&nbsp;Wenxiao Wu ,&nbsp;Li Ma ,&nbsp;Xinge You ,&nbsp;Changxin Gao ,&nbsp;Nong Sang ,&nbsp;Yuanjie Shao","doi":"10.1016/j.patcog.2024.111089","DOIUrl":"10.1016/j.patcog.2024.111089","url":null,"abstract":"<div><div>Few-shot classification (FSC) is a challenging problem, which aims to identify novel classes with limited samples. Most existing methods employ vanilla transfer learning or episodic meta-training to learn a feature extractor, and then measure the similarity between the query image and the few support examples of novel classes. However, these approaches merely learn feature representations from individual images, overlooking the exploration of the interrelationships among images. This neglect can hinder the attainment of more discriminative feature representations, thus limiting the potential improvement of few-shot classification performance. To address this issue, we propose a Sample Relationship Exploration (SRE) module comprising the Sample-level Attention (SA), Explicit Guidance (EG) and Channel-wise Adaptive Fusion (CAF) components, to learn discriminative category-related features. Specifically, we first employ the SA component to explore the similarity relationships among samples and obtain aggregated features of similar samples. Furthermore, to enhance the robustness of these features, we introduce the EG component to explicitly guide the learning of sample relationships by providing an ideal affinity map among samples. Finally, the CAF component is adopted to perform weighted fusion of the original features and the aggregated features, yielding category-related embeddings. The proposed method is a plug-and-play module which can be embedded into both transfer learning and meta-learning based few-shot classification frameworks. Extensive experiments on benchmark datasets show that the proposed module can effectively improve the performance over baseline models, and also perform competitively against the state-of-the-art algorithms. The source code is available at <span><span>https://github.com/Chenguoz/SRE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111089"},"PeriodicalIF":7.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SeaTrack: Rethinking Observation-Centric SORT for Robust Nearshore Multiple Object Tracking SeaTrack:重新思考以观测为中心的 SORT,实现稳健的近岸多目标跟踪
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-28 DOI: 10.1016/j.patcog.2024.111091
Jiangang Ding , Wei Li , Ming Yang , Yuanlin Zhao , Lili Pei , Aojia Tian
{"title":"SeaTrack: Rethinking Observation-Centric SORT for Robust Nearshore Multiple Object Tracking","authors":"Jiangang Ding ,&nbsp;Wei Li ,&nbsp;Ming Yang ,&nbsp;Yuanlin Zhao ,&nbsp;Lili Pei ,&nbsp;Aojia Tian","doi":"10.1016/j.patcog.2024.111091","DOIUrl":"10.1016/j.patcog.2024.111091","url":null,"abstract":"<div><div>Nearshore Multiple Object Tracking (NMOT) aims to locate and associate nearshore objects. Current approaches utilize Automatic Identification Systems (AIS) and radar to accomplish this task. However, video signals can describe the visual appearance of nearshore objects without prior information such as identity, location, or motion. In addition, sea clutter will not affect the capture of living objects by visual sensors. Recognizing this, we analyzed three key long-term challenges of the vision-based NMOT and proposed a tracking pipeline that relies solely on motion information. Maritime objects are highly susceptible to being obscured or submerged by waves, resulting in fragmented tracklets. We first introduced guiding modulation to address the long-term occlusion and interaction of maritime objects. Subsequently, we modeled confidence, altitude, and angular momentum to mitigate the effects of motion blur, ringing, and overshoot artifacts to observations in unstable imaging environments. Additionally, we designed a motion fusion mechanism that combines long-term macro tracklets with short-term fine-grained tracklets. This correction mechanism helps reduce the estimation variance of the Kalman Filter (KF) to alleviate the substantial nonlinear motion of maritime objects. We call this pipeline SeaTrack, which remains simple, online, and real-time, demonstrating excellent performance and scalability in benchmark evaluations.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111091"},"PeriodicalIF":7.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信