Web scale photo hash clustering on a single machine

Yunchao Gong, Marcin Pawłowski, Fei Yang, Louis Brandy, Lubomir D. Bourdev, R. Fergus
{"title":"Web scale photo hash clustering on a single machine","authors":"Yunchao Gong, Marcin Pawłowski, Fei Yang, Louis Brandy, Lubomir D. Bourdev, R. Fergus","doi":"10.1109/CVPR.2015.7298596","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of clustering a very large number of photos (i.e. hundreds of millions a day) in a stream into millions of clusters. This is particularly important as the popularity of photo sharing websites, such as Facebook, Google, and Instagram. Given large number of photos available online, how to efficiently organize them is an open problem. To address this problem, we propose to cluster the binary hash codes of a large number of photos into binary cluster centers. We present a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which we can build hash indexes to speedup computation. The proposed method is capable of clustering millions of photos on a single machine in a few minutes. We show that this approach is usually several magnitude faster than standard k-means and produces comparable clustering accuracy. In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"81","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2015.7298596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 81

Abstract

This paper addresses the problem of clustering a very large number of photos (i.e. hundreds of millions a day) in a stream into millions of clusters. This is particularly important as the popularity of photo sharing websites, such as Facebook, Google, and Instagram. Given large number of photos available online, how to efficiently organize them is an open problem. To address this problem, we propose to cluster the binary hash codes of a large number of photos into binary cluster centers. We present a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which we can build hash indexes to speedup computation. The proposed method is capable of clustering millions of photos on a single machine in a few minutes. We show that this approach is usually several magnitude faster than standard k-means and produces comparable clustering accuracy. In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.
在单个机器上进行Web规模的照片散列集群
本文解决了将流中的大量照片(即每天数亿张)聚类到数百万个集群中的问题。随着Facebook、谷歌和Instagram等照片分享网站的流行,这一点尤为重要。鉴于网上有大量的照片,如何有效地组织它们是一个悬而未决的问题。为了解决这个问题,我们提出将大量照片的二进制哈希码聚类到二进制聚类中心。我们提出了一种快速的二进制k-means算法,它直接作用于图像的保持相似性的哈希,并将它们聚类到二进制中心,我们可以在二进制中心上构建哈希索引来加速计算。该方法能够在几分钟内在一台机器上对数百万张照片进行聚类。我们表明,这种方法通常比标准k-means快几个数量级,并产生相当的聚类精度。此外,我们提出了一种基于二进制k-means的在线聚类方法,该方法能够在单个机器上对大型照片流进行聚类,并展示了在垃圾邮件检测和趋势照片发现方面的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信