IEEE Transactions on Multimedia最新文献

筛选
英文 中文
Learning Local Features by Reinforcing Spatial Structure Information 通过强化空间结构信息学习局部特征
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-30 DOI: 10.1109/TMM.2024.3521777
Li Wang;Yunzhou Zhang;Fawei Ge;Wenjing Bai;Yifan Wang
{"title":"Learning Local Features by Reinforcing Spatial Structure Information","authors":"Li Wang;Yunzhou Zhang;Fawei Ge;Wenjing Bai;Yifan Wang","doi":"10.1109/TMM.2024.3521777","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521777","url":null,"abstract":"Learning-based local feature extraction algorithms have advanced considerably in terms of robustness. While excelling at enhancing feature robustness, some outstanding algorithms tend to neglect discriminability—a crucial aspect in vision tasks. With the increase of deep learning convolutional layers, we observe an amplification of semantic information within images, accompanied by a diminishing presence of spatial structural information. This imbalance primarily contributes to the subpar feature discriminability. Therefore, this paper introduces a novel network framework aimed at imbuing feature descriptors with robustness and discriminative power by reinforcing spatial structural information. Our approach incorporates a spatial structure enhancement module into the network architecture, spanning from shallow to deep layers, ensuring the retention of rich structural information in deeper layers, thereby enhancing discriminability. Finally, we evaluate our method, demonstrating superior performance in visual localization and feature-matching tasks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1420-1431"},"PeriodicalIF":8.4,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143583265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disaggregation Distillation for Person Search 人物搜索的分解蒸馏
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-30 DOI: 10.1109/TMM.2024.3521732
Yizhen Jia;Rong Quan;Haiyan Chen;Jiamei Liu;Yichao Yan;Song Bai;Jie Qin
{"title":"Disaggregation Distillation for Person Search","authors":"Yizhen Jia;Rong Quan;Haiyan Chen;Jiamei Liu;Yichao Yan;Song Bai;Jie Qin","doi":"10.1109/TMM.2024.3521732","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521732","url":null,"abstract":"Person search is a challenging task in computer vision and multimedia understanding, which aims at localizing and identifying target individuals in realistic scenes. State-of-the-art models achieve remarkable success but suffer from overloaded computation and inefficient inference, making them impractical in most real-world applications. A promising approach to tackle this dilemma is to compress person search models with knowledge distillation (KD). Previous KD-based person search methods typically distill the knowledge from the re-identification (re-id) branch, completely overlooking the useful knowledge from the detection branch. In addition, we elucidate that the imbalance between person and background regions in feature maps has a negative impact on the distillation process. To this end, we propose a novel KD-based approach, namely Disaggregation Distillation for Person Search (DDPS), which disaggregates the distillation process and feature maps, respectively. Firstly, the distillation process is disaggregated into two task-oriented sub-processes, <italic>i.e.</i>, detection distillation and re-id distillation, to help the student learn both accurate localization capability and discriminative person embeddings. Secondly, we disaggregate each feature map into person and background regions, and distill these two regions independently to alleviate the imbalance problem. More concretely, three types of distillation modules, <italic>i.e.</i>, logit distillation (LD), correlation distillation (CD), and disaggregation feature distillation (DFD), are particularly designed to transfer comprehensive information from the teacher to the student. Note that such a simple yet effective distillation scheme can be readily applied to both homogeneous and heterogeneous teacher-student combinations. We conduct extensive experiments on two person search benchmarks, where the results demonstrate that, surprisingly, our DDPS enables the student model to surpass the performance of the corresponding teacher model, even achieving comparable results with general person search models.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"158-170"},"PeriodicalIF":8.4,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution 用于遥感图像超分辨率的频率辅助曼巴
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-30 DOI: 10.1109/TMM.2024.3521798
Yi Xiao;Qiangqiang Yuan;Kui Jiang;Yuzeng Chen;Qiang Zhang;Chia-Wen Lin
{"title":"Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution","authors":"Yi Xiao;Qiangqiang Yuan;Kui Jiang;Yuzeng Chen;Qiang Zhang;Chia-Wen Lin","doi":"10.1109/TMM.2024.3521798","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521798","url":null,"abstract":"Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Considering that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1783-1796"},"PeriodicalIF":8.4,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Multimedia Publication Information IEEE多媒体出版信息汇刊
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-27 DOI: 10.1109/TMM.2024.3444988
{"title":"IEEE Transactions on Multimedia Publication Information","authors":"","doi":"10.1109/TMM.2024.3444988","DOIUrl":"https://doi.org/10.1109/TMM.2024.3444988","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"C2-C2"},"PeriodicalIF":8.4,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10817140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-Aware Pre-Selected Neural Rendering for Light Field Reconstruction 面向光场重建的结构感知预选择神经渲染
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-27 DOI: 10.1109/TMM.2024.3521784
Song Chang;Youfang Lin;Shuo Zhang
{"title":"Structure-Aware Pre-Selected Neural Rendering for Light Field Reconstruction","authors":"Song Chang;Youfang Lin;Shuo Zhang","doi":"10.1109/TMM.2024.3521784","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521784","url":null,"abstract":"As densely-sampled Light Field (LF) images are beneficial to many applications, LF reconstruction becomes an important technology in related fields. Recently, neural rendering shows great potential in reconstruction tasks. However, volume rendering in existing methods needs to sample many points on the whole camera ray or epipolar line, which is time-consuming. In this paper, specifically for LF images with regular angular sampling, we propose a novel Structure-Aware Pre-Selected neural rendering framework for LF reconstruction. Instead of sampling on the whole epipolar line, we propose to sample on several specific positions, which are estimated using the color and inherent scene structure information explored in the regular angular sampled LF images. Sampling only a few points that closely match the target pixel, the feature of the target pixel is quickly rendered with high-quality. Finally, we fuse the features and decode them in the view dimension to obtain the final target view. Experiments show that the proposed method outperforms the state-of-the-art LF reconstruction methods in both qualitative and quantitative comparisons across various tasks. Our method also surpasses the most existing methods in terms of speed. Moreover, without any retraining or fine-tuning, the performance of our method with no-per-scene optimization is even better than the methods with per-scene optimization.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1574-1587"},"PeriodicalIF":8.4,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143583171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MaskBlur: Spatial and Angular Data Augmentation for Light Field Image Super-Resolution MaskBlur:光场图像超分辨率的空间和角度数据增强
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-27 DOI: 10.1109/TMM.2024.3521781
Wentao Chao;Fuqing Duan;Yulan Guo;Guanghui Wang
{"title":"MaskBlur: Spatial and Angular Data Augmentation for Light Field Image Super-Resolution","authors":"Wentao Chao;Fuqing Duan;Yulan Guo;Guanghui Wang","doi":"10.1109/TMM.2024.3521781","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521781","url":null,"abstract":"Data augmentation (DA) is an effective approach for enhancing model performance with limited data, such as light field (LF) image super-resolution (SR). LF images inherently possess rich spatial and angular information. Nonetheless, there is a scarcity of DA methodologies explicitly tailored for LF images, and existing works tend to concentrate solely on either the spatial or angular domain. This paper proposes a novel spatial and angular DA strategy named MaskBlur for LF image SR by concurrently addressing spatial and angular aspects. MaskBlur consists of spatial blur and angular dropout two components. Spatial blur is governed by a spatial mask, which controls where pixels are blurred, i.e., pasting pixels between the low-resolution and high-resolution domains. The angular mask is responsible for angular dropout, i.e., selecting which views to perform the spatial blur operation. By doing so, MaskBlur enables the model to treat pixels differently in the spatial and angular domains when super-resolving LF images rather than blindly treating all pixels equally. Extensive experiments demonstrate the efficacy of MaskBlur in significantly enhancing the performance of existing SR methods. We further extend MaskBlur to other LF image tasks such as denoising, deblurring, low-light enhancement, and real-world SR.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2181-2193"},"PeriodicalIF":8.4,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPT4Ego: Unleashing the Potential of Pre-Trained Models for Zero-Shot Egocentric Action Recognition GPT4Ego:释放零射击自我中心行动识别预训练模型的潜力
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-27 DOI: 10.1109/TMM.2024.3521658
Guangzhao Dai;Xiangbo Shu;Wenhao Wu;Rui Yan;Jiachao Zhang
{"title":"GPT4Ego: Unleashing the Potential of Pre-Trained Models for Zero-Shot Egocentric Action Recognition","authors":"Guangzhao Dai;Xiangbo Shu;Wenhao Wu;Rui Yan;Jiachao Zhang","doi":"10.1109/TMM.2024.3521658","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521658","url":null,"abstract":"Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks. This advancement paves the way for notable performance in some egocentric tasks, Zero-Shot Egocentric Action Recognition (ZS-EAR), entailing VLMs zero-shot to recognize actions from first-person videos enriched in more realistic human-environment interactions. Typically, VLMs handle ZS-EAR as a global video-text matching task, which often leads to suboptimal alignment of vision and linguistic knowledge. We propose a refined approach for ZS-EAR using VLMs, emphasizing fine-grained concept-description alignment that capitalizes on the rich semantic and contextual details in egocentric videos. In this work, we introduce a straightforward yet remarkably potent VLM framework, <italic>aka</i> GPT4Ego, designed to enhance the fine-grained alignment of concept and description between vision and language. Specifically, we first propose a new Ego-oriented Text Prompting (EgoTP<inline-formula><tex-math>$spadesuit$</tex-math></inline-formula>) scheme, which effectively prompts action-related text-contextual semantics by evolving word-level class names to sentence-level contextual descriptions by ChatGPT with well-designed chain-of-thought textual prompts. Moreover, we design a new Ego-oriented Visual Parsing (EgoVP<inline-formula><tex-math>$clubsuit$</tex-math></inline-formula>) strategy that learns action-related vision-contextual semantics by refining global-level images to part-level contextual concepts with the help of SAM. Extensive experiments demonstrate GPT4Ego significantly outperforms existing VLMs on three large-scale egocentric video benchmarks, i.e., EPIC-KITCHENS-100 (33.2%<inline-formula><tex-math>$uparrow$</tex-math></inline-formula><inline-formula><tex-math>$_{bm {+9.4}}$</tex-math></inline-formula>), EGTEA (39.6%<inline-formula><tex-math>$uparrow$</tex-math></inline-formula><inline-formula><tex-math>$_{bm {+5.5}}$</tex-math></inline-formula>), and CharadesEgo (31.5%<inline-formula><tex-math>$uparrow$</tex-math></inline-formula><inline-formula><tex-math>$_{bm {+2.6}}$</tex-math></inline-formula>). In addition, benefiting from the novel mechanism of fine-grained concept and description alignment, GPT4Ego can sustainably evolve with the advancement of ever-growing pre-trained foundational models. We hope this work can encourage the egocentric community to build more investigation into pre-trained vision-language models.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"401-413"},"PeriodicalIF":8.4,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VGNet: Multimodal Feature Extraction and Fusion Network for 3D CAD Model Retrieval 面向三维CAD模型检索的多模态特征提取与融合网络
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-27 DOI: 10.1109/TMM.2024.3521706
Feiwei Qin;Gaoyang Zhan;Meie Fang;C. L. Philip Chen;Ping Li
{"title":"VGNet: Multimodal Feature Extraction and Fusion Network for 3D CAD Model Retrieval","authors":"Feiwei Qin;Gaoyang Zhan;Meie Fang;C. L. Philip Chen;Ping Li","doi":"10.1109/TMM.2024.3521706","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521706","url":null,"abstract":"The reuse of 3D CAD models is crucial for industrial manufacturing because it shortens development cycles and reduces costs. Significant progress has been made in deep learning-based 3D model retrievals. There are many representations for 3D models, among which the multi-view representation has demonstrated a superior retrieval performance. However, directly applying these 3D model retrieval approaches to 3D CAD model retrievals may result in issues such as the loss of the engineering semantic and structural information. In this paper, we find that multiple views and B-rep can complement each other. Therefore, we propose the view graph neural network (VGNet), which effectively combines multiple views and B-rep to accomplish 3D CAD model retrieval. More specifically, based on the characteristics of the regular shape of 3D CAD models, and the richness of the attribute information in the B-rep attribute graph, we separately design two feature extraction networks for each modality. Moreover, to explore the latent relationships between the multiple views and B-rep attribute graphs, a multi-head attention enhancement module is designed. Furthermore, the multimodal fusion module is adopted to make the joint representation of the 3D CAD models more discriminative by using a correlation loss function. Experiments are carried out on a real manufacturing 3D CAD dataset and a public dataset to validate the effectiveness of the proposed approach.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1432-1447"},"PeriodicalIF":8.4,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143583175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiview Feature Decoupling for Deep Subspace Clustering 深度子空间聚类的多视图特征解耦
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-27 DOI: 10.1109/TMM.2024.3521776
Yuxiu Lin;Hui Liu;Ren Wang;Qiang Guo;Caiming Zhang
{"title":"Multiview Feature Decoupling for Deep Subspace Clustering","authors":"Yuxiu Lin;Hui Liu;Ren Wang;Qiang Guo;Caiming Zhang","doi":"10.1109/TMM.2024.3521776","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521776","url":null,"abstract":"Deep multi-view subspace clustering aims to reveal a common subspace structure by exploiting rich multi-view information. Despite promising progress, current methods focus only on multi-view consistency and complementarity, often overlooking the adverse influence of entangled superfluous information in features. Moreover, most existing works lack scalability and are inefficient for large-scale scenarios. To this end, we innovatively propose a deep subspace clustering method via Multi-view Feature Decoupling (MvFD). First, MvFD incorporates well-designed multi-type auto-encoders with self-supervised learning, explicitly decoupling consistent, complementary, and superfluous features for every view. The disentangled and interpretable feature space can then better serve unified representation learning. By integrating these three types of information within a unified framework, we employ information theory to obtain a minimal and sufficient representation with high discriminability. Besides, we introduce a deep metric network to model self-expression correlation more efficiently, where network parameters remain unaffected by changes in sample numbers. Extensive experiments show that MvFD yields State-of-the-Art performance in various types of multi-view datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"544-556"},"PeriodicalIF":8.4,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DuPMAM: An Efficient Dual Perception Framework Equipped With a Sharp Testing Strategy for Point Cloud Analysis DuPMAM:一种高效的双感知框架,配备了锐利的点云分析测试策略
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-27 DOI: 10.1109/TMM.2024.3521735
Yijun Chen;Xianwei Zheng;Zhulun Yang;Xutao Li;Jiantao Zhou;Yuanman Li
{"title":"DuPMAM: An Efficient Dual Perception Framework Equipped With a Sharp Testing Strategy for Point Cloud Analysis","authors":"Yijun Chen;Xianwei Zheng;Zhulun Yang;Xutao Li;Jiantao Zhou;Yuanman Li","doi":"10.1109/TMM.2024.3521735","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521735","url":null,"abstract":"The challenges in point cloud analysis are primarily attributed to the irregular and unordered nature of the data. Numerous existing approaches, inspired by the Transformer, introduce attention mechanisms to extract the 3D geometric features. However, these intricate geometric extractors incur high computational overhead and unfavorable inference latency. To tackle this predicament, in this paper, we propose a lightweight and faster attention-based network, named Dual Perception MAM (DuPMAM), for point cloud analysis. Specifically, we present a novel simple Point Multiplicative Attention Mechanism (PMAM). It is implemented solely through single feed-forward fully connected layers, hence leading to lower model complexity and superior inference speed. Based on that, we further devise a dual perception strategy by constructing both a local attention block and a global attention block to learn fine-grained geometric and overall representational features, respectively. Consequently, compared to the existing approaches, our method has excellent perception of local details and global contours of the point cloud objects. In addition, we ingeniously design a Graph-Multiscale Perceptual Field (GMPF) testing strategy for model performance enhancement. It has significant advantage over the traditional voting strategy and is generally applicable to point cloud tasks, encompassing classification, part segmentation and indoor scene segmentation. Empowered by the GMPF testing strategy, DuPMAM delivers the new State-of-the-Art on the real-world dataset ScanObjectNN, the synthetic dataset ModelNet40 and the part segmentation dataset ShapeNet, and compared to the recent GB-Net, our DuPMAM trains 6 times faster and tests 2 times faster.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1760-1771"},"PeriodicalIF":8.4,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信