Xi Yang;Qiubai Zhou;Ziyu Wei;Hong Liu;Nannan Wang;Xinbo Gao
{"title":"Elaborate Teacher: Improved Semi-Supervised Object Detection With Rich Image Exploiting","authors":"Xi Yang;Qiubai Zhou;Ziyu Wei;Hong Liu;Nannan Wang;Xinbo Gao","doi":"10.1109/TMM.2024.3453040","DOIUrl":null,"url":null,"abstract":"Semi-Supervised Object Detection (SSOD) has shown remarkable results by leveraging image pairs with a teacher-student framework. An excellent strong augmentation method can generate richer images and alleviate the influence of noise in pseudo-labels. However, existing data augmentation methods for SSOD do not consider instance-level information, thus, they cannot make full use of unlabeled data. Besides, the current teacher-student framework in SSOD solely relies on pseudo-labeling techniques, which may disregard some uncertain information. In this article, we introduce a new method called Elaborate Teacher which generates and exploits image pairs in a more refined manner. To enrich strongly augmented images, a novel data augmentation method called Information-Aware Mixup Representation (IAMR) is proposed. IAMR utilizes the teacher model's predictions as prior information and considers instance-level information, which can be seamlessly integrated with existing SSOD data augmentation methods. Furthermore, to fully exploit the information in unlabeled data, we propose the Enhanced Scale Consistency Regularization (ESCR), which considers the consistency from both semantic space and feature space. Elaborate Teacher introduces a fresh data augmentation method, complemented by consistency regularization, which boosts the performance of semi-supervised object detectors. Extensive experiments on the \n<italic>PASCAL VOC</i>\n and \n<italic>MS-COCO</i>\n datasets demonstrate the effectiveness of our method in leveraging unlabeled image information. Our method consistently outperforms the baseline method and improves mAP by 11.6% and 9.0% relative to the supervised baseline method when using 5% and 10% of labeled data on \n<italic>MS-COCO</i>\n, respectively.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11345-11357"},"PeriodicalIF":8.4000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663070/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Semi-Supervised Object Detection (SSOD) has shown remarkable results by leveraging image pairs with a teacher-student framework. An excellent strong augmentation method can generate richer images and alleviate the influence of noise in pseudo-labels. However, existing data augmentation methods for SSOD do not consider instance-level information, thus, they cannot make full use of unlabeled data. Besides, the current teacher-student framework in SSOD solely relies on pseudo-labeling techniques, which may disregard some uncertain information. In this article, we introduce a new method called Elaborate Teacher which generates and exploits image pairs in a more refined manner. To enrich strongly augmented images, a novel data augmentation method called Information-Aware Mixup Representation (IAMR) is proposed. IAMR utilizes the teacher model's predictions as prior information and considers instance-level information, which can be seamlessly integrated with existing SSOD data augmentation methods. Furthermore, to fully exploit the information in unlabeled data, we propose the Enhanced Scale Consistency Regularization (ESCR), which considers the consistency from both semantic space and feature space. Elaborate Teacher introduces a fresh data augmentation method, complemented by consistency regularization, which boosts the performance of semi-supervised object detectors. Extensive experiments on the
PASCAL VOC
and
MS-COCO
datasets demonstrate the effectiveness of our method in leveraging unlabeled image information. Our method consistently outperforms the baseline method and improves mAP by 11.6% and 9.0% relative to the supervised baseline method when using 5% and 10% of labeled data on
MS-COCO
, respectively.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.