SOFW: A Synergistic Optimization Framework for Indoor 3D Object Detection

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2025-01-01 DOI:10.1109/TMM.2024.3521782

Kun Dai;Zhiqiang Jiang;Tao Xie;Ke Wang;Dedong Liu;Zhendong Fan;Ruifeng Li;Lijun Zhao;Mohamed Omar

{"title":"SOFW: A Synergistic Optimization Framework for Indoor 3D Object Detection","authors":"Kun Dai;Zhiqiang Jiang;Tao Xie;Ke Wang;Dedong Liu;Zhendong Fan;Ruifeng Li;Lijun Zhao;Mohamed Omar","doi":"10.1109/TMM.2024.3521782","DOIUrl":null,"url":null,"abstract":"In this work, we observe that indoor 3D object detection across varied scene domains encompasses both universal attributes and specific features. Based on this insight, we propose SOFW, a synergistic optimization framework that investigates the feasibility of optimizing 3D object detection tasks concurrently spanning several dataset domains. The core of SOFW is identifying domain-shared parameters to encode universal scene attributes, while employing domain-specific parameters to delve into the particularities of each scene domain. Technically, we introduce a set abstraction alteration strategy (SAAS) that embeds learnable domain-specific features into set abstraction layers, thus empowering the network with a refined comprehension for each scene domain. Besides, we develop an element-wise sharing strategy (ESS) to facilitate fine-grained adaptive discernment between domain-shared and domain-specific parameters for network layers. Benefited from the proposed techniques, SOFW crafts feature representations for each scene domain by learning domain-specific parameters, whilst encoding generic attributes and contextual interdependencies via domain-shared parameters. Built upon the classical detection framework VoteNet without any complicated modules, SOFW delivers impressive performances under multiple benchmarks with much fewer total storage footprint. Additionally, we demonstrate that the proposed ESS is a universal strategy and applying it to a voxels-based approach TR3D can realize cutting-edge detection accuracy on all S3DIS, ScanNet, and SUN RGB-D datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"637-651"},"PeriodicalIF":8.4000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10819977/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In this work, we observe that indoor 3D object detection across varied scene domains encompasses both universal attributes and specific features. Based on this insight, we propose SOFW, a synergistic optimization framework that investigates the feasibility of optimizing 3D object detection tasks concurrently spanning several dataset domains. The core of SOFW is identifying domain-shared parameters to encode universal scene attributes, while employing domain-specific parameters to delve into the particularities of each scene domain. Technically, we introduce a set abstraction alteration strategy (SAAS) that embeds learnable domain-specific features into set abstraction layers, thus empowering the network with a refined comprehension for each scene domain. Besides, we develop an element-wise sharing strategy (ESS) to facilitate fine-grained adaptive discernment between domain-shared and domain-specific parameters for network layers. Benefited from the proposed techniques, SOFW crafts feature representations for each scene domain by learning domain-specific parameters, whilst encoding generic attributes and contextual interdependencies via domain-shared parameters. Built upon the classical detection framework VoteNet without any complicated modules, SOFW delivers impressive performances under multiple benchmarks with much fewer total storage footprint. Additionally, we demonstrate that the proposed ESS is a universal strategy and applying it to a voxels-based approach TR3D can realize cutting-edge detection accuracy on all S3DIS, ScanNet, and SUN RGB-D datasets.

查看原文本刊更多论文

SOFW：室内三维目标检测的协同优化框架

在这项工作中，我们观察到不同场景域的室内3D物体检测既包含通用属性，也包含特定特征。基于这一见解，我们提出了SOFW，这是一个协同优化框架，研究了跨多个数据集域并发优化3D目标检测任务的可行性。SOFW的核心是识别领域共享参数来编码通用场景属性，同时利用领域特定参数来深入挖掘每个场景领域的特殊性。从技术上讲，我们引入了一种集合抽象变更策略（SAAS），该策略将可学习的领域特定特征嵌入到集合抽象层中，从而使网络能够对每个场景领域进行精确的理解。此外，我们还开发了一种元素智能共享策略（ESS），以促进网络层在领域共享参数和领域特定参数之间的细粒度自适应识别。受益于所提出的技术，SOFW通过学习领域特定的参数来为每个场景领域制作特征表示，同时通过领域共享参数编码通用属性和上下文相互依赖性。SOFW基于经典的检测框架VoteNet，没有任何复杂的模块，在多个基准测试中提供了令人印象深刻的性能，总存储空间更少。此外，我们证明了所提出的ESS是一种通用策略，并将其应用于基于体素的方法，TR3D可以在所有S3DIS， ScanNet和SUN RGB-D数据集上实现尖端的检测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.