Zhe Fu , Shuo Yuan , Pengjun Cao , Jing Wei , Heng Wang , Gaoxiang Zhang , Bizheng Luo , Hong Zhang
{"title":"Open-world object detection with multi-dataset image–label matching","authors":"Zhe Fu , Shuo Yuan , Pengjun Cao , Jing Wei , Heng Wang , Gaoxiang Zhang , Bizheng Luo , Hong Zhang","doi":"10.1016/j.compeleceng.2025.110742","DOIUrl":null,"url":null,"abstract":"<div><div>In real-world scenarios, many categories appear in target scenes that were not encountered during training, making existing video object detection methods unsuitable for open-world applications. This paper proposes an open-world object detection method based on multi-dataset image–label matching to tackle the challenges of open-world object detection. First, a multi-dataset image–label matching training strategy is proposed, which aligns image features with label text features from multiple datasets, an innovative matching classification loss function is designed to guide model training. Then, an image–label deep fusion module is constructed to strengthen the model’s ability to understand the correspondence between visual and textual descriptions, thereby improving the accuracy of matching label texts to corresponding regions in images. A decoupled, staged training method is employed, independently training the proposal generation and category classification stages to better adapt to the diversity and uncertainty of open-world scenarios. Finally, extensive comparative and ablation experiments validate the proposed method’s effectiveness on the open-world dataset LVIS, achieving an average improvement of about 2 percentage points over baseline methods in various evaluation metrics. Additionally, visualizations across different scenes are presented to intuitively demonstrate the method’s efficacy and advanced performance.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"128 ","pages":"Article 110742"},"PeriodicalIF":4.9000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625006858","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
In real-world scenarios, many categories appear in target scenes that were not encountered during training, making existing video object detection methods unsuitable for open-world applications. This paper proposes an open-world object detection method based on multi-dataset image–label matching to tackle the challenges of open-world object detection. First, a multi-dataset image–label matching training strategy is proposed, which aligns image features with label text features from multiple datasets, an innovative matching classification loss function is designed to guide model training. Then, an image–label deep fusion module is constructed to strengthen the model’s ability to understand the correspondence between visual and textual descriptions, thereby improving the accuracy of matching label texts to corresponding regions in images. A decoupled, staged training method is employed, independently training the proposal generation and category classification stages to better adapt to the diversity and uncertainty of open-world scenarios. Finally, extensive comparative and ablation experiments validate the proposed method’s effectiveness on the open-world dataset LVIS, achieving an average improvement of about 2 percentage points over baseline methods in various evaluation metrics. Additionally, visualizations across different scenes are presented to intuitively demonstrate the method’s efficacy and advanced performance.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.