TOP-MATA:一种顶k余弦相似度搜索的最大优先遍历方法

Shiwei Zhu, Junjie Wu, Guoping Xia, Limin Li
{"title":"TOP-MATA:一种顶k余弦相似度搜索的最大优先遍历方法","authors":"Shiwei Zhu, Junjie Wu, Guoping Xia, Limin Li","doi":"10.1109/ICSSSM.2010.5530100","DOIUrl":null,"url":null,"abstract":"Recent years have witnessed an increased interest in computing cosine similarities between documents (or commodities). Most previous studies require the specification of a minimum similarity threshold to perform cosine similarity search. However, it is usually difficult for users to provide an appropriate threshold in practice. Instead, in this paper, we propose to search top-K strongly related pairs of objects as measured by the cosine similarity. Specifically, we first define the cosine similarity measure from the association analysis point of view and identify the monotone property of an upper bound of the cosine measure, then exploit a Max-First traversal strategy for developing the TOP-MATA algorithm. Compared with previous TOP-DATA method, TOP-MATA has the advantage of saving the computations for false-positive item pairs. Finally, experimental results demonstrate the computational efficiency of the algorithm.","PeriodicalId":409538,"journal":{"name":"2010 7th International Conference on Service Systems and Service Management","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"TOP-MATA: A Max-First traversal method for top-K cosine similarity search\",\"authors\":\"Shiwei Zhu, Junjie Wu, Guoping Xia, Limin Li\",\"doi\":\"10.1109/ICSSSM.2010.5530100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent years have witnessed an increased interest in computing cosine similarities between documents (or commodities). Most previous studies require the specification of a minimum similarity threshold to perform cosine similarity search. However, it is usually difficult for users to provide an appropriate threshold in practice. Instead, in this paper, we propose to search top-K strongly related pairs of objects as measured by the cosine similarity. Specifically, we first define the cosine similarity measure from the association analysis point of view and identify the monotone property of an upper bound of the cosine measure, then exploit a Max-First traversal strategy for developing the TOP-MATA algorithm. Compared with previous TOP-DATA method, TOP-MATA has the advantage of saving the computations for false-positive item pairs. Finally, experimental results demonstrate the computational efficiency of the algorithm.\",\"PeriodicalId\":409538,\"journal\":{\"name\":\"2010 7th International Conference on Service Systems and Service Management\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 7th International Conference on Service Systems and Service Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSSSM.2010.5530100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 7th International Conference on Service Systems and Service Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSSM.2010.5530100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

近年来,人们对计算文档(或商品)之间的余弦相似度越来越感兴趣。以往的研究大多要求指定最小相似度阈值来进行余弦相似度搜索。然而,在实践中,用户通常很难提供一个适当的阈值。相反,在本文中,我们建议通过余弦相似度来搜索top-K强相关的对象对。具体来说,我们首先从关联分析的角度定义了余弦相似性度量,并确定了余弦度量上界的单调性,然后利用Max-First遍历策略开发了TOP-MATA算法。与以往的TOP-DATA方法相比,TOP-MATA方法节省了假阳性项对的计算量。最后,通过实验验证了该算法的计算效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
TOP-MATA: A Max-First traversal method for top-K cosine similarity search
Recent years have witnessed an increased interest in computing cosine similarities between documents (or commodities). Most previous studies require the specification of a minimum similarity threshold to perform cosine similarity search. However, it is usually difficult for users to provide an appropriate threshold in practice. Instead, in this paper, we propose to search top-K strongly related pairs of objects as measured by the cosine similarity. Specifically, we first define the cosine similarity measure from the association analysis point of view and identify the monotone property of an upper bound of the cosine measure, then exploit a Max-First traversal strategy for developing the TOP-MATA algorithm. Compared with previous TOP-DATA method, TOP-MATA has the advantage of saving the computations for false-positive item pairs. Finally, experimental results demonstrate the computational efficiency of the algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信