Bamboo: Building Mega-Scale Vision Dataset Continually with Human–Machine Synergy

IF 9.3 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision Pub Date : 2025-05-13 DOI:10.1007/s11263-025-02450-2

Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu

{"title":"Bamboo: Building Mega-Scale Vision Dataset Continually with Human–Machine Synergy","authors":"Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu","doi":"10.1007/s11263-025-02450-2","DOIUrl":null,"url":null,"abstract":"<p>Large-scale datasets play a vital role in computer vision. But current datasets are annotated blindly without differentiation to samples, making the data collection inefficient and unscalable. The open question is how to build a mega-scale dataset actively. Although advanced active learning algorithms might be the answer, we experimentally found that they are lame in the realistic annotation scenario where out-of-distribution data is extensive. This work thus proposes a novel active learning framework for realistic dataset annotation. Equipped with this framework, we build a high-quality vision dataset—<b>Bamboo</b>, which consists of 69M image classification annotations with 119K categories and 28M object bounding box annotations with 809 categories. We organize these categories by a hierarchical taxonomy integrated from several knowledge bases. The classification annotations are four times larger than ImageNet22K, and that of detection is three times larger than Object365. Compared to ImageNet22K and Objects365, models pre-trained on Bamboo achieve superior performance among various downstream tasks (6.2% gains on classification and 2.1% gains on detection). We believe our active learning framework and Bamboo are essential for future work. Code and dataset are available at https://github.com/ZhangYuanhan-AI/Bamboo.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"123 1","pages":""},"PeriodicalIF":9.3000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-025-02450-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Large-scale datasets play a vital role in computer vision. But current datasets are annotated blindly without differentiation to samples, making the data collection inefficient and unscalable. The open question is how to build a mega-scale dataset actively. Although advanced active learning algorithms might be the answer, we experimentally found that they are lame in the realistic annotation scenario where out-of-distribution data is extensive. This work thus proposes a novel active learning framework for realistic dataset annotation. Equipped with this framework, we build a high-quality vision dataset—Bamboo, which consists of 69M image classification annotations with 119K categories and 28M object bounding box annotations with 809 categories. We organize these categories by a hierarchical taxonomy integrated from several knowledge bases. The classification annotations are four times larger than ImageNet22K, and that of detection is three times larger than Object365. Compared to ImageNet22K and Objects365, models pre-trained on Bamboo achieve superior performance among various downstream tasks (6.2% gains on classification and 2.1% gains on detection). We believe our active learning framework and Bamboo are essential for future work. Code and dataset are available at https://github.com/ZhangYuanhan-AI/Bamboo.

查看原文本刊更多论文

Bamboo：人机协同持续构建超大规模视觉数据集

大规模数据集在计算机视觉中起着至关重要的作用。但目前的数据集是盲目标注，没有对样本进行区分，使得数据收集效率低下，不可扩展。悬而未决的问题是如何积极地建立一个超大规模的数据集。虽然先进的主动学习算法可能是答案，但我们在实验中发现，它们在实际的标注场景中是无效的，因为在这些场景中，分布外的数据是广泛的。因此，这项工作提出了一种新的现实数据集标注主动学习框架。在此框架下，我们构建了一个高质量的视觉数据集bamboo，该数据集由69M个图像分类注释（119K个类别）和28M个对象边界框注释（809个类别）组成。我们通过从多个知识库集成的分层分类法来组织这些类别。分类标注比ImageNet22K大4倍，检测标注比Object365大3倍。与ImageNet22K和Objects365相比，在Bamboo上预训练的模型在各种下游任务中获得了更好的性能（分类增益6.2%，检测增益2.1%）。我们相信我们的主动学习框架和Bamboo对未来的工作至关重要。代码和数据集可从https://github.com/ZhangYuanhan-AI/Bamboo获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Vision 工程技术-计算机：人工智能

CiteScore

29.80

自引率

2.10%

发文量

163

审稿时长

6 months

期刊介绍： The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.