A Domain Robust Approach For Image Dataset Construction

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI:10.1145/2964284.2967213

Yazhou Yao, Xiansheng Hua, Fumin Shen, Jian Zhang, Zhenmin Tang

引用次数: 38

Abstract

There have been increasing research interests in automatically constructing image dataset by collecting images from the Internet. However, existing methods tend to have a weak domain adaptation ability, known as the "dataset bias problem". To address this issue, in this work, we propose a novel image dataset construction framework which can generalize well to unseen target domains. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora (GBNC) to obtain a richer semantic description, from which the noisy query expansions are then filtered out. By treating each expansion as a "bag" and the retrieved images therein as "instances", we formulate image filtering as a multi-instance learning (MIL) problem with constrained positive bags. By this approach, images from different data distributions will be kept while with noisy images filtered out. Comprehensive experiments on two challenging tasks demonstrate the effectiveness of our proposed approach.

查看原文本刊更多论文

一种图像数据集构建的领域鲁棒方法

通过采集互联网上的图像，自动构建图像数据集的研究日益受到关注。然而，现有的方法往往具有较弱的领域适应能力，被称为“数据集偏差问题”。为了解决这一问题，我们提出了一种新的图像数据集构建框架，该框架可以很好地泛化到未知的目标域。具体而言，首先通过在Google Books Ngrams corpus (GBNC)中搜索来扩展给定的查询，以获得更丰富的语义描述，然后从中过滤掉带有噪声的查询扩展。通过将每个扩展视为一个“袋”，并将其中的检索图像视为“实例”，我们将图像过滤制定为具有约束正袋的多实例学习(MIL)问题。通过这种方法，可以保留不同数据分布的图像，同时滤除噪声图像。在两个具有挑战性的任务上的综合实验证明了我们提出的方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 24th ACM international conference on Multimedia

自引率

0.00%

发文量