Open-world object detection with multi-dataset image–label matching

IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Zhe Fu , Shuo Yuan , Pengjun Cao , Jing Wei , Heng Wang , Gaoxiang Zhang , Bizheng Luo , Hong Zhang
{"title":"Open-world object detection with multi-dataset image–label matching","authors":"Zhe Fu ,&nbsp;Shuo Yuan ,&nbsp;Pengjun Cao ,&nbsp;Jing Wei ,&nbsp;Heng Wang ,&nbsp;Gaoxiang Zhang ,&nbsp;Bizheng Luo ,&nbsp;Hong Zhang","doi":"10.1016/j.compeleceng.2025.110742","DOIUrl":null,"url":null,"abstract":"<div><div>In real-world scenarios, many categories appear in target scenes that were not encountered during training, making existing video object detection methods unsuitable for open-world applications. This paper proposes an open-world object detection method based on multi-dataset image–label matching to tackle the challenges of open-world object detection. First, a multi-dataset image–label matching training strategy is proposed, which aligns image features with label text features from multiple datasets, an innovative matching classification loss function is designed to guide model training. Then, an image–label deep fusion module is constructed to strengthen the model’s ability to understand the correspondence between visual and textual descriptions, thereby improving the accuracy of matching label texts to corresponding regions in images. A decoupled, staged training method is employed, independently training the proposal generation and category classification stages to better adapt to the diversity and uncertainty of open-world scenarios. Finally, extensive comparative and ablation experiments validate the proposed method’s effectiveness on the open-world dataset LVIS, achieving an average improvement of about 2 percentage points over baseline methods in various evaluation metrics. Additionally, visualizations across different scenes are presented to intuitively demonstrate the method’s efficacy and advanced performance.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"128 ","pages":"Article 110742"},"PeriodicalIF":4.9000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625006858","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

In real-world scenarios, many categories appear in target scenes that were not encountered during training, making existing video object detection methods unsuitable for open-world applications. This paper proposes an open-world object detection method based on multi-dataset image–label matching to tackle the challenges of open-world object detection. First, a multi-dataset image–label matching training strategy is proposed, which aligns image features with label text features from multiple datasets, an innovative matching classification loss function is designed to guide model training. Then, an image–label deep fusion module is constructed to strengthen the model’s ability to understand the correspondence between visual and textual descriptions, thereby improving the accuracy of matching label texts to corresponding regions in images. A decoupled, staged training method is employed, independently training the proposal generation and category classification stages to better adapt to the diversity and uncertainty of open-world scenarios. Finally, extensive comparative and ablation experiments validate the proposed method’s effectiveness on the open-world dataset LVIS, achieving an average improvement of about 2 percentage points over baseline methods in various evaluation metrics. Additionally, visualizations across different scenes are presented to intuitively demonstrate the method’s efficacy and advanced performance.
基于多数据集图像标签匹配的开放世界目标检测
在现实场景中,许多类别出现在训练过程中没有遇到的目标场景中,使得现有的视频目标检测方法不适合开放世界的应用。针对开放世界目标检测存在的问题,提出了一种基于多数据集图像标签匹配的开放世界目标检测方法。首先,提出了一种多数据集图像-标签匹配训练策略,将图像特征与多个数据集的标签文本特征对齐,设计了一种创新的匹配分类损失函数来指导模型训练;然后,构建图像-标签深度融合模块,增强模型对视觉描述和文本描述之间对应关系的理解能力,从而提高标签文本与图像中相应区域匹配的准确性。采用解耦的分阶段训练方法,独立训练提案生成和类别分类两个阶段,以更好地适应开放世界场景的多样性和不确定性。最后,广泛的对比和消融实验验证了该方法在开放世界数据集LVIS上的有效性,在各种评估指标中平均比基线方法提高了约2个百分点。此外,还提供了跨场景的可视化,直观地展示了该方法的有效性和先进性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Electrical Engineering
Computers & Electrical Engineering 工程技术-工程:电子与电气
CiteScore
9.20
自引率
7.00%
发文量
661
审稿时长
47 days
期刊介绍: The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信