{"title":"TOP-MATA:一种顶k余弦相似度搜索的最大优先遍历方法","authors":"Shiwei Zhu, Junjie Wu, Guoping Xia, Limin Li","doi":"10.1109/ICSSSM.2010.5530100","DOIUrl":null,"url":null,"abstract":"Recent years have witnessed an increased interest in computing cosine similarities between documents (or commodities). Most previous studies require the specification of a minimum similarity threshold to perform cosine similarity search. However, it is usually difficult for users to provide an appropriate threshold in practice. Instead, in this paper, we propose to search top-K strongly related pairs of objects as measured by the cosine similarity. Specifically, we first define the cosine similarity measure from the association analysis point of view and identify the monotone property of an upper bound of the cosine measure, then exploit a Max-First traversal strategy for developing the TOP-MATA algorithm. Compared with previous TOP-DATA method, TOP-MATA has the advantage of saving the computations for false-positive item pairs. Finally, experimental results demonstrate the computational efficiency of the algorithm.","PeriodicalId":409538,"journal":{"name":"2010 7th International Conference on Service Systems and Service Management","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"TOP-MATA: A Max-First traversal method for top-K cosine similarity search\",\"authors\":\"Shiwei Zhu, Junjie Wu, Guoping Xia, Limin Li\",\"doi\":\"10.1109/ICSSSM.2010.5530100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent years have witnessed an increased interest in computing cosine similarities between documents (or commodities). Most previous studies require the specification of a minimum similarity threshold to perform cosine similarity search. However, it is usually difficult for users to provide an appropriate threshold in practice. Instead, in this paper, we propose to search top-K strongly related pairs of objects as measured by the cosine similarity. Specifically, we first define the cosine similarity measure from the association analysis point of view and identify the monotone property of an upper bound of the cosine measure, then exploit a Max-First traversal strategy for developing the TOP-MATA algorithm. Compared with previous TOP-DATA method, TOP-MATA has the advantage of saving the computations for false-positive item pairs. Finally, experimental results demonstrate the computational efficiency of the algorithm.\",\"PeriodicalId\":409538,\"journal\":{\"name\":\"2010 7th International Conference on Service Systems and Service Management\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 7th International Conference on Service Systems and Service Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSSSM.2010.5530100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 7th International Conference on Service Systems and Service Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSSM.2010.5530100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
TOP-MATA: A Max-First traversal method for top-K cosine similarity search
Recent years have witnessed an increased interest in computing cosine similarities between documents (or commodities). Most previous studies require the specification of a minimum similarity threshold to perform cosine similarity search. However, it is usually difficult for users to provide an appropriate threshold in practice. Instead, in this paper, we propose to search top-K strongly related pairs of objects as measured by the cosine similarity. Specifically, we first define the cosine similarity measure from the association analysis point of view and identify the monotone property of an upper bound of the cosine measure, then exploit a Max-First traversal strategy for developing the TOP-MATA algorithm. Compared with previous TOP-DATA method, TOP-MATA has the advantage of saving the computations for false-positive item pairs. Finally, experimental results demonstrate the computational efficiency of the algorithm.