A weighted tag similarity measure based on a collaborative weight model

SMUC '10 Pub Date : 2010-10-30 DOI:10.1145/1871985.1871999

Gokavarapu Srinivas, Niket Tandon, Vasudeva Varma

{"title":"A weighted tag similarity measure based on a collaborative weight model","authors":"Gokavarapu Srinivas, Niket Tandon, Vasudeva Varma","doi":"10.1145/1871985.1871999","DOIUrl":null,"url":null,"abstract":"The problem of measuring semantic relatedness between social tags remains largely open. Given the structure of social bookmarking systems, similarity measures need to be addressed from a social bookmarking systems perspective. We address the fundamental problem of weight model for tags over which every similarity measure is based. We propose a weight model for tagging systems that considers the user dimension unlike existing measures based on tag frequency. Visual analysis of tag clouds depicts that the proposed model provides intuitively better scores for weights than tag frequency. We also propose weighted similarity model that is conceptually different from the contemporary frequency based similarity measures. Based on the weighted similarity model, we present weighted variations of several existing measures like Dice and Cosine similarity measures. We evaluate the proposed similarity model using Spearman's correlation coefficient, with WordNet as the gold standard. Our method achieves 20% improvement over the traditional similarity measures like dice and cosine similarity and also over the most recent tag similarity measures like mutual information with distributional aggregation. Finally, we show the practical effectiveness of the proposed weighted similarity measures by performing search over tagged documents using Social SimRank over a large real world dataset.","PeriodicalId":244822,"journal":{"name":"SMUC '10","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SMUC '10","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1871985.1871999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

The problem of measuring semantic relatedness between social tags remains largely open. Given the structure of social bookmarking systems, similarity measures need to be addressed from a social bookmarking systems perspective. We address the fundamental problem of weight model for tags over which every similarity measure is based. We propose a weight model for tagging systems that considers the user dimension unlike existing measures based on tag frequency. Visual analysis of tag clouds depicts that the proposed model provides intuitively better scores for weights than tag frequency. We also propose weighted similarity model that is conceptually different from the contemporary frequency based similarity measures. Based on the weighted similarity model, we present weighted variations of several existing measures like Dice and Cosine similarity measures. We evaluate the proposed similarity model using Spearman's correlation coefficient, with WordNet as the gold standard. Our method achieves 20% improvement over the traditional similarity measures like dice and cosine similarity and also over the most recent tag similarity measures like mutual information with distributional aggregation. Finally, we show the practical effectiveness of the proposed weighted similarity measures by performing search over tagged documents using Social SimRank over a large real world dataset.

查看原文本刊更多论文

基于协同权重模型的加权标签相似度度量

衡量社会标签之间语义相关性的问题在很大程度上仍然是开放的。鉴于社会书签系统的结构，相似性度量需要从社会书签系统的角度来解决。我们解决了每个相似度量所基于的标签的权重模型的基本问题。我们为标签系统提出了一个权重模型，该模型考虑用户维度，而不是基于标签频率的现有度量。标签云的可视化分析表明，所提出的模型直观地提供了比标签频率更好的权重分数。我们还提出了加权相似度模型，该模型在概念上不同于当前基于频率的相似度度量。在加权相似度模型的基础上，提出了几种现有相似度度量的加权变化，如Dice和Cosine相似度度量。我们以WordNet为金标准，使用Spearman相关系数来评估所提出的相似性模型。我们的方法比传统的相似度度量(如骰子和余弦相似度)以及最近的标签相似度度量(如分布聚合的互信息)提高了20%。最后，我们通过在大型真实世界数据集上使用Social SimRank对标记文档进行搜索，展示了所提出的加权相似度度量的实际有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

SMUC '10

自引率

0.00%

发文量