Beyond Hard Negatives in Product Search: Semantic Matching Using One-Class Classification (SMOCC)

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI:10.1145/3539597.3570488

Arindam Bhattacharya, Ankit Gandhi, Vijay Huddar, Ankith M S, Aayush Moroney, Atul Saroop, Rahul Bhagat

{"title":"Beyond Hard Negatives in Product Search: Semantic Matching Using One-Class Classification (SMOCC)","authors":"Arindam Bhattacharya, Ankit Gandhi, Vijay Huddar, Ankith M S, Aayush Moroney, Atul Saroop, Rahul Bhagat","doi":"10.1145/3539597.3570488","DOIUrl":null,"url":null,"abstract":"Semantic matching is an important component of a product search pipeline. Its goal is to capture the semantic intent of the search query as opposed to the syntactic matching performed by a lexical matching system. A semantic matching model captures relationships like synonyms, and also captures common behavioral patterns to retrieve relevant results by generalizing from purchase data. They however suffer from lack of availability of informative negative examples for model training. Various methods have been proposed in the past to address this issue based upon hard-negative mining and contrastive learning. In this work, we propose a novel method for semantic matching based on one-class classification called SMOCC. Given a query and a relevant product, SMOCC generates the representation of an informative negative which is then used to train the model. Our method is based on the idea of generating negatives by using adversarial search in the neighborhood of the positive examples. We also propose a novel approach for selecting the radius to generate adversarial negative products around queries based on the model's understanding of the query. Depending on how we select the radius, we propose two variants of our method: SMOCC-QS, that quantizes the queries using their specificity, and SMOCC-EM, that uses expectation-maximization paradigm to iteratively learn the best radius. We show that our method outperforms the state-of-the-art hard negative mining approaches by increasing the purchase recall by 3 percentage points, and improving the percentage of exacts retrieved by up to 5 percentage points while reducing irrelevant results by 1.8 percentage points.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3539597.3570488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Semantic matching is an important component of a product search pipeline. Its goal is to capture the semantic intent of the search query as opposed to the syntactic matching performed by a lexical matching system. A semantic matching model captures relationships like synonyms, and also captures common behavioral patterns to retrieve relevant results by generalizing from purchase data. They however suffer from lack of availability of informative negative examples for model training. Various methods have been proposed in the past to address this issue based upon hard-negative mining and contrastive learning. In this work, we propose a novel method for semantic matching based on one-class classification called SMOCC. Given a query and a relevant product, SMOCC generates the representation of an informative negative which is then used to train the model. Our method is based on the idea of generating negatives by using adversarial search in the neighborhood of the positive examples. We also propose a novel approach for selecting the radius to generate adversarial negative products around queries based on the model's understanding of the query. Depending on how we select the radius, we propose two variants of our method: SMOCC-QS, that quantizes the queries using their specificity, and SMOCC-EM, that uses expectation-maximization paradigm to iteratively learn the best radius. We show that our method outperforms the state-of-the-art hard negative mining approaches by increasing the purchase recall by 3 percentage points, and improving the percentage of exacts retrieved by up to 5 percentage points while reducing irrelevant results by 1.8 percentage points.

查看原文本刊更多论文

在产品搜索中超越硬否定:使用单类分类(SMOCC)的语义匹配

语义匹配是产品搜索管道的重要组成部分。它的目标是捕获搜索查询的语义意图，而不是由词法匹配系统执行的语法匹配。语义匹配模型捕获诸如同义词之类的关系，还捕获常见的行为模式，以便通过对购买数据进行泛化来检索相关结果。然而，它们的缺点是缺乏用于模型训练的信息性负面示例。过去已经提出了基于硬负挖掘和对比学习的各种方法来解决这个问题。在这项工作中，我们提出了一种新的基于单类分类的语义匹配方法SMOCC。给定一个查询和一个相关产品，SMOCC生成一个信息否定的表示，然后用于训练模型。我们的方法是基于在正例的邻域中使用对抗性搜索来生成负数的思想。我们还提出了一种基于模型对查询的理解来选择半径以在查询周围生成对抗性负产品的新方法。根据我们选择半径的方式，我们提出了我们的方法的两种变体:SMOCC-QS，使用它们的特异性量化查询，以及SMOCC-EM，使用期望最大化范式迭代学习最佳半径。我们表明，我们的方法优于最先进的硬负挖掘方法，将购买召回率提高了3个百分点，并将准确检索的百分比提高了5个百分点，同时将不相关的结果减少了1.8个百分点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量