Unsupervised learning of co-occurrences for face images retrieval

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI:10.1145/3444685.3446265

Thomas Petit, Pierre Letessier, S. Duffner, Christophe Garcia

{"title":"Unsupervised learning of co-occurrences for face images retrieval","authors":"Thomas Petit, Pierre Letessier, S. Duffner, Christophe Garcia","doi":"10.1145/3444685.3446265","DOIUrl":null,"url":null,"abstract":"Despite a huge leap in performance of face recognition systems in recent years, some cases remain challenging for them while being trivial for humans. This is because a human brain is exploiting much more information than the face appearance to identify a person. In this work, we aim at capturing the social context of unlabeled observed faces in order to improve face retrieval. In particular, we propose a framework that substantially improves face retrieval by exploiting the faces occurring simultaneously in a query's context to infer a multi-dimensional social context descriptor. Combining this compact structural descriptor with the individual visual face features in a common feature vector considerably increases the correct face retrieval rate and allows to disambiguate a large proportion of query results of different persons that are barely distinguishable visually. To evaluate our framework, we also introduce a new large dataset of faces of French TV personalities organised in TV shows in order to capture the co-occurrence relations between people. On this dataset, our framework is able to improve the mean Average Precision over a set of internal queries from 67.93% (using only facial features extracted with a state-of-the-art pre-trained model) to 78.16% (using both facial features and faces co-occurrences), and from 67.88% to 77.36% over a set of external queries.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3444685.3446265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Despite a huge leap in performance of face recognition systems in recent years, some cases remain challenging for them while being trivial for humans. This is because a human brain is exploiting much more information than the face appearance to identify a person. In this work, we aim at capturing the social context of unlabeled observed faces in order to improve face retrieval. In particular, we propose a framework that substantially improves face retrieval by exploiting the faces occurring simultaneously in a query's context to infer a multi-dimensional social context descriptor. Combining this compact structural descriptor with the individual visual face features in a common feature vector considerably increases the correct face retrieval rate and allows to disambiguate a large proportion of query results of different persons that are barely distinguishable visually. To evaluate our framework, we also introduce a new large dataset of faces of French TV personalities organised in TV shows in order to capture the co-occurrence relations between people. On this dataset, our framework is able to improve the mean Average Precision over a set of internal queries from 67.93% (using only facial features extracted with a state-of-the-art pre-trained model) to 78.16% (using both facial features and faces co-occurrences), and from 67.88% to 77.36% over a set of external queries.

查看原文本刊更多论文

人脸图像检索中共现现象的无监督学习

尽管近年来人脸识别系统的性能有了巨大的飞跃，但有些情况对它们来说仍然具有挑战性，而对人类来说则微不足道。这是因为人脑在识别一个人时，利用的信息比面部表情多得多。在这项工作中，我们的目标是捕捉未标记的观察面孔的社会背景，以提高人脸检索。特别是，我们提出了一个框架，该框架通过利用在查询上下文中同时出现的面孔来推断多维社会上下文描述符，从而大大改进了人脸检索。将这种紧凑的结构描述符与公共特征向量中的单个视觉人脸特征相结合，大大提高了正确的人脸检索率，并允许消除大部分视觉上难以区分的不同人的查询结果的歧义。为了评估我们的框架，我们还引入了一个新的大型数据集，其中包括在电视节目中组织的法国电视名人的面孔，以捕捉人与人之间的共现关系。在这个数据集上，我们的框架能够将一组内部查询的平均平均精度从67.93%(仅使用最先进的预训练模型提取的面部特征)提高到78.16%(同时使用面部特征和面部共同出现)，以及从67.88%提高到77.36%在一组外部查询。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

自引率

0.00%

发文量