Generalized Learning of Neural Network Based Semantic Similarity Models and Its Application in Movie Search

Xugang Ye, Zijie Qi, Xinying Song, Xiaodong He, Dan Massey
{"title":"Generalized Learning of Neural Network Based Semantic Similarity Models and Its Application in Movie Search","authors":"Xugang Ye, Zijie Qi, Xinying Song, Xiaodong He, Dan Massey","doi":"10.1109/ICDMW.2015.34","DOIUrl":null,"url":null,"abstract":"Modeling text semantic similarity via neural network approaches has significantly improved performance on a set of information retrieval tasks in recent studies. However these neural network based latent semantic models are mostly trained by using simple user behavior logging data such as clicked (query, document)-pairs, and all the clicked pairs are assumed to be uniformly positive examples. Therefore, the existing method for learning the model parameters does not differentiate data samples that might reflect different relevance information. In this paper, we relax this assumption and propose a new learning method through a generalized loss function to capture the subtle relevance differences of training samples when a more granular label structure is available. We have applied it to the Xbox One's movie search task where session-based user behavior information is available and the granular relevance differences of training samples are derived from the session logs. Compared with the existing method, our new generalized loss function has demonstrated superior test performance measured by several user-engagement metrics. It also yields significant performance lift when the score computed from our model is used as a semantic similarity feature in the gradient boosted decision tree model which is widely used in modern search engines.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2015.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Modeling text semantic similarity via neural network approaches has significantly improved performance on a set of information retrieval tasks in recent studies. However these neural network based latent semantic models are mostly trained by using simple user behavior logging data such as clicked (query, document)-pairs, and all the clicked pairs are assumed to be uniformly positive examples. Therefore, the existing method for learning the model parameters does not differentiate data samples that might reflect different relevance information. In this paper, we relax this assumption and propose a new learning method through a generalized loss function to capture the subtle relevance differences of training samples when a more granular label structure is available. We have applied it to the Xbox One's movie search task where session-based user behavior information is available and the granular relevance differences of training samples are derived from the session logs. Compared with the existing method, our new generalized loss function has demonstrated superior test performance measured by several user-engagement metrics. It also yields significant performance lift when the score computed from our model is used as a semantic similarity feature in the gradient boosted decision tree model which is widely used in modern search engines.
基于神经网络语义相似度模型的广义学习及其在电影搜索中的应用
近年来,利用神经网络方法对文本语义相似度进行建模,显著提高了文本信息检索的性能。然而,这些基于神经网络的潜在语义模型大多是通过简单的用户行为记录数据(如点击(查询、文档)对)来训练的,并且所有的点击对都被假设为一致的正例。因此,现有的模型参数学习方法没有区分可能反映不同相关信息的数据样本。在本文中,我们放宽了这一假设,并提出了一种新的学习方法,通过广义损失函数来捕捉训练样本在更细粒度的标签结构下的微妙相关性差异。我们已经将其应用于Xbox One的电影搜索任务,其中基于会话的用户行为信息是可用的,训练样本的粒度相关性差异来自会话日志。与现有方法相比,我们的新广义损失函数通过几个用户参与指标显示出优越的测试性能。在现代搜索引擎中广泛使用的梯度增强决策树模型中,当从我们的模型计算的分数作为语义相似特征时,它也会产生显著的性能提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信