利用高效负抽样改进协同度量学习

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval Pub Date : 2019-07-18 DOI:10.1145/3331184.3331337

Viet-Anh Tran, Romain Hennequin, Jimena Royo-Letelier, Manuel Moussallam

{"title":"利用高效负抽样改进协同度量学习","authors":"Viet-Anh Tran, Romain Hennequin, Jimena Royo-Letelier, Manuel Moussallam","doi":"10.1145/3331184.3331337","DOIUrl":null,"url":null,"abstract":"Distance metric learning based on triplet loss has been applied with success in a wide range of applications such as face recognition, image retrieval, speaker change detection and recently recommendation with the Collaborative Metric Learning (CML) model. However, as we show in this article, CML requires large batches to work reasonably well because of a too simplistic uniform negative sampling strategy for selecting triplets. Due to memory limitations, this makes it difficult to scale in high-dimensional scenarios. To alleviate this problem, we propose here a 2-stage negative sampling strategy which finds triplets that are highly informative for learning. Our strategy allows CML to work effectively in terms of accuracy and popularity bias, even when the batch size is an order of magnitude smaller than what would be needed with the default uniform sampling. We demonstrate the suitability of the proposed strategy for recommendation and exhibit consistent positive results across various datasets.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"155 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Improving Collaborative Metric Learning with Efficient Negative Sampling\",\"authors\":\"Viet-Anh Tran, Romain Hennequin, Jimena Royo-Letelier, Manuel Moussallam\",\"doi\":\"10.1145/3331184.3331337\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distance metric learning based on triplet loss has been applied with success in a wide range of applications such as face recognition, image retrieval, speaker change detection and recently recommendation with the Collaborative Metric Learning (CML) model. However, as we show in this article, CML requires large batches to work reasonably well because of a too simplistic uniform negative sampling strategy for selecting triplets. Due to memory limitations, this makes it difficult to scale in high-dimensional scenarios. To alleviate this problem, we propose here a 2-stage negative sampling strategy which finds triplets that are highly informative for learning. Our strategy allows CML to work effectively in terms of accuracy and popularity bias, even when the batch size is an order of magnitude smaller than what would be needed with the default uniform sampling. We demonstrate the suitability of the proposed strategy for recommendation and exhibit consistent positive results across various datasets.\",\"PeriodicalId\":20700,\"journal\":{\"name\":\"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"volume\":\"155 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3331184.3331337\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3331184.3331337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

基于三重态损失的距离度量学习在人脸识别、图像检索、说话人变化检测以及最近的协同度量学习(CML)模型推荐等广泛应用中取得了成功。然而，正如我们在本文中所展示的，CML需要大量的批处理才能很好地工作，因为选择三元组的统一负采样策略过于简单。由于内存限制，这使得难以在高维场景中进行扩展。为了缓解这个问题，我们在这里提出了一个两阶段的负抽样策略，该策略可以找到对学习具有高度信息的三胞胎。我们的策略允许CML在准确性和流行偏差方面有效地工作，即使批处理大小比默认均匀抽样所需的小一个数量级。我们证明了所提出的推荐策略的适用性，并在各种数据集上展示了一致的积极结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Collaborative Metric Learning with Efficient Negative Sampling

Distance metric learning based on triplet loss has been applied with success in a wide range of applications such as face recognition, image retrieval, speaker change detection and recently recommendation with the Collaborative Metric Learning (CML) model. However, as we show in this article, CML requires large batches to work reasonably well because of a too simplistic uniform negative sampling strategy for selecting triplets. Due to memory limitations, this makes it difficult to scale in high-dimensional scenarios. To alleviate this problem, we propose here a 2-stage negative sampling strategy which finds triplets that are highly informative for learning. Our strategy allows CML to work effectively in terms of accuracy and popularity bias, even when the batch size is an order of magnitude smaller than what would be needed with the default uniform sampling. We demonstrate the suitability of the proposed strategy for recommendation and exhibit consistent positive results across various datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

自引率

0.00%

发文量