{"title":"Efficient similarity search","authors":"H. Jégou","doi":"10.1145/3122865.3122871","DOIUrl":null,"url":null,"abstract":"This chapter addresses one of the fundamental problems involved in multimedia systems, namely efficient similarity search for large collections of multimedia content. This problem has received a lot of attention from various research communities. In particular, it is a historical line of research in computational geometry and databases. The computer vision and multimedia communities have adopted pragmatic approaches guided by practical requirements: the large sets of features required to describe image collections make visual search a highly demanding task. As a result, early works [Flickner et al. 1995, Fagin 1998, Beis and Lowe 1997] in image indexing have foreseen the interest in approximate algorithms, especially after the dissemination of methods based on local description in the 90s, as any improvement obtained on this indexing part improves the whole visual search system. \n \nAmong the existing approximate nearest neighbors (ANN) strategies, the popular framework of Locality-Sensitive Hashing (LSH) [Indyk and Motwani 1998, Gionis et al. 1999] provides theoretical guarantees on the search quality with limited assumptions on the underlying data distribution. It was first proposed [Indyk and Motwani 1998] for the Hamming and l1 spaces, and was later extended to the Euclidean/ cosine cases [Charikar 2002, Datar et al. 2004] or the earth mover's distance [Charikar 2002, Andoni and Indyk 2006]. LSH has been successfully used for local descriptors [Ke et al. 2004], 3D object indexing [Matei et al. 2006, Shakhnarovich et al. 2006], and other fields such as audio retrieval [Casey and Slaney 2007, Ryynanen and Klapuri 2008]. It has also received some attention in a context of private information retrieval [Pathak and Raj 2012, Aghasaryan et al. 2013, Furon et al. 2013]. \n \nA few years ago, approaches inspired by compression and more specifically quantization-based approaches [Jǵou et al. 2011] were shown to be a viable alternative to hashing methods, and shown successful for efficiently searching in a billion-sized dataset. \n \nThis chapter discusses these different trends. It is organized as follows. Section 5.1 gives some background references and concepts, including evaluation issues. Most of the methods and variants are exposed within the LSH framework. It is worth mentioning that LSH is more of a concept than a particular algorithm. The search algorithms associated with LSH follow two distinct search mechanisms, the probe-cell model and sketches, which are discussed in Sections 5.2 and 5.3, respectively. Section 5.4 describes methods inspired by compression algorithms, while Section 5.5 discusses hybrid approaches combining the non-exhaustiveness of the cell-probe model with the advantages of sketches or compression-based algorithms. Other metrics than Euclidean and cosine are briefly discussed in Section 5.6.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Multimedia Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3122865.3122871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
This chapter addresses one of the fundamental problems involved in multimedia systems, namely efficient similarity search for large collections of multimedia content. This problem has received a lot of attention from various research communities. In particular, it is a historical line of research in computational geometry and databases. The computer vision and multimedia communities have adopted pragmatic approaches guided by practical requirements: the large sets of features required to describe image collections make visual search a highly demanding task. As a result, early works [Flickner et al. 1995, Fagin 1998, Beis and Lowe 1997] in image indexing have foreseen the interest in approximate algorithms, especially after the dissemination of methods based on local description in the 90s, as any improvement obtained on this indexing part improves the whole visual search system.
Among the existing approximate nearest neighbors (ANN) strategies, the popular framework of Locality-Sensitive Hashing (LSH) [Indyk and Motwani 1998, Gionis et al. 1999] provides theoretical guarantees on the search quality with limited assumptions on the underlying data distribution. It was first proposed [Indyk and Motwani 1998] for the Hamming and l1 spaces, and was later extended to the Euclidean/ cosine cases [Charikar 2002, Datar et al. 2004] or the earth mover's distance [Charikar 2002, Andoni and Indyk 2006]. LSH has been successfully used for local descriptors [Ke et al. 2004], 3D object indexing [Matei et al. 2006, Shakhnarovich et al. 2006], and other fields such as audio retrieval [Casey and Slaney 2007, Ryynanen and Klapuri 2008]. It has also received some attention in a context of private information retrieval [Pathak and Raj 2012, Aghasaryan et al. 2013, Furon et al. 2013].
A few years ago, approaches inspired by compression and more specifically quantization-based approaches [Jǵou et al. 2011] were shown to be a viable alternative to hashing methods, and shown successful for efficiently searching in a billion-sized dataset.
This chapter discusses these different trends. It is organized as follows. Section 5.1 gives some background references and concepts, including evaluation issues. Most of the methods and variants are exposed within the LSH framework. It is worth mentioning that LSH is more of a concept than a particular algorithm. The search algorithms associated with LSH follow two distinct search mechanisms, the probe-cell model and sketches, which are discussed in Sections 5.2 and 5.3, respectively. Section 5.4 describes methods inspired by compression algorithms, while Section 5.5 discusses hybrid approaches combining the non-exhaustiveness of the cell-probe model with the advantages of sketches or compression-based algorithms. Other metrics than Euclidean and cosine are briefly discussed in Section 5.6.
本章讨论了多媒体系统中涉及的一个基本问题,即大型多媒体内容集合的高效相似度搜索。这个问题受到了各个研究团体的广泛关注。特别是,它是计算几何和数据库研究的历史路线。计算机视觉和多媒体社区采用了由实际需求指导的实用方法:描述图像集合所需的大量特征使视觉搜索成为一项要求很高的任务。因此,早期在图像索引方面的工作[Flickner et al. 1995, Fagin 1998, Beis and Lowe 1997]已经预见到对近似算法的兴趣,特别是在90年代基于局部描述的方法传播之后,因为在该索引部分获得的任何改进都会改善整个视觉搜索系统。在现有的近似近邻(ANN)策略中,流行的位置敏感哈希(LSH)框架[Indyk and Motwani 1998, Gionis et al. 1999]通过对底层数据分布的有限假设,为搜索质量提供了理论上的保证。它首先被提出[Indyk和Motwani 1998]用于Hamming和l1空间,后来被扩展到欧几里得/余弦情况[Charikar 2002, Datar et al. 2004]或土方的距离[Charikar 2002, Andoni和Indyk 2006]。LSH已成功用于局部描述符[Ke et al. 2004], 3D对象索引[Matei et al. 2006, Shakhnarovich et al. 2006],以及其他领域,如音频检索[Casey and Slaney 2007, Ryynanen and Klapuri 2008]。在私人信息检索的背景下,它也受到了一些关注[Pathak和Raj 2012, Aghasaryan等人2013,Furon等人2013]。几年前,受压缩和更具体的基于量化的方法(Jǵou et al. 2011)启发的方法被证明是哈希方法的可行替代方案,并成功地在十亿大小的数据集中进行有效搜索。本章讨论这些不同的趋势。它的组织如下。第5.1节给出了一些背景参考和概念,包括评估问题。大多数方法和变体都在LSH框架中公开。值得一提的是,LSH更多的是一个概念,而不是一个特定的算法。与LSH相关的搜索算法遵循两种不同的搜索机制,探针单元模型和草图,分别在第5.2节和5.3节中讨论。第5.4节描述了受压缩算法启发的方法,而第5.5节讨论了混合方法,将细胞探针模型的非耗尽性与草图或基于压缩的算法的优势相结合。除欧几里得和余弦以外的其他度量将在第5.6节中简要讨论。