一种图像数据集构建的领域鲁棒方法

Yazhou Yao, Xiansheng Hua, Fumin Shen, Jian Zhang, Zhenmin Tang
{"title":"一种图像数据集构建的领域鲁棒方法","authors":"Yazhou Yao, Xiansheng Hua, Fumin Shen, Jian Zhang, Zhenmin Tang","doi":"10.1145/2964284.2967213","DOIUrl":null,"url":null,"abstract":"There have been increasing research interests in automatically constructing image dataset by collecting images from the Internet. However, existing methods tend to have a weak domain adaptation ability, known as the \"dataset bias problem\". To address this issue, in this work, we propose a novel image dataset construction framework which can generalize well to unseen target domains. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora (GBNC) to obtain a richer semantic description, from which the noisy query expansions are then filtered out. By treating each expansion as a \"bag\" and the retrieved images therein as \"instances\", we formulate image filtering as a multi-instance learning (MIL) problem with constrained positive bags. By this approach, images from different data distributions will be kept while with noisy images filtered out. Comprehensive experiments on two challenging tasks demonstrate the effectiveness of our proposed approach.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"223 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":"{\"title\":\"A Domain Robust Approach For Image Dataset Construction\",\"authors\":\"Yazhou Yao, Xiansheng Hua, Fumin Shen, Jian Zhang, Zhenmin Tang\",\"doi\":\"10.1145/2964284.2967213\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There have been increasing research interests in automatically constructing image dataset by collecting images from the Internet. However, existing methods tend to have a weak domain adaptation ability, known as the \\\"dataset bias problem\\\". To address this issue, in this work, we propose a novel image dataset construction framework which can generalize well to unseen target domains. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora (GBNC) to obtain a richer semantic description, from which the noisy query expansions are then filtered out. By treating each expansion as a \\\"bag\\\" and the retrieved images therein as \\\"instances\\\", we formulate image filtering as a multi-instance learning (MIL) problem with constrained positive bags. By this approach, images from different data distributions will be kept while with noisy images filtered out. Comprehensive experiments on two challenging tasks demonstrate the effectiveness of our proposed approach.\",\"PeriodicalId\":140670,\"journal\":{\"name\":\"Proceedings of the 24th ACM international conference on Multimedia\",\"volume\":\"223 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"38\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 24th ACM international conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2964284.2967213\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2964284.2967213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38

摘要

通过采集互联网上的图像,自动构建图像数据集的研究日益受到关注。然而,现有的方法往往具有较弱的领域适应能力,被称为“数据集偏差问题”。为了解决这一问题,我们提出了一种新的图像数据集构建框架,该框架可以很好地泛化到未知的目标域。具体而言,首先通过在Google Books Ngrams corpus (GBNC)中搜索来扩展给定的查询,以获得更丰富的语义描述,然后从中过滤掉带有噪声的查询扩展。通过将每个扩展视为一个“袋”,并将其中的检索图像视为“实例”,我们将图像过滤制定为具有约束正袋的多实例学习(MIL)问题。通过这种方法,可以保留不同数据分布的图像,同时滤除噪声图像。在两个具有挑战性的任务上的综合实验证明了我们提出的方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Domain Robust Approach For Image Dataset Construction
There have been increasing research interests in automatically constructing image dataset by collecting images from the Internet. However, existing methods tend to have a weak domain adaptation ability, known as the "dataset bias problem". To address this issue, in this work, we propose a novel image dataset construction framework which can generalize well to unseen target domains. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora (GBNC) to obtain a richer semantic description, from which the noisy query expansions are then filtered out. By treating each expansion as a "bag" and the retrieved images therein as "instances", we formulate image filtering as a multi-instance learning (MIL) problem with constrained positive bags. By this approach, images from different data distributions will be kept while with noisy images filtered out. Comprehensive experiments on two challenging tasks demonstrate the effectiveness of our proposed approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信