Web 2.0 social bookmark selection for tag clustering

2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering Pub Date : 2013-04-15 DOI:10.1109/ICPRIME.2013.6496724

S. S. Kumar, H. Inbarani

{"title":"Web 2.0 social bookmark selection for tag clustering","authors":"S. S. Kumar, H. Inbarani","doi":"10.1109/ICPRIME.2013.6496724","DOIUrl":null,"url":null,"abstract":"Tagging is a popular way to annotate web 2.0 web sites. A tag is any user-generated word or phrase that helps to organize web 2.0 content. The current hype around web 2.0 applications, poses several important challenges for future data and web mining methods. An important challenge of Web 2.0 is the fact that a large amount of data has been generated over a short period. Clustering the tag data is very tedious since the tag space is very large in several social book marking web sites. So, instead of clustering the whole tag space of Web 2.0 data, some tags frequent enough in the tag space can be selected for clustering by applying feature selection techniques. The goal of feature selection is to determine a marginal bookmarked URL subset from a Web 2.0 data while retaining a suitably high accuracy in representing the original bookmarks. Tag clustering is the process of grouping similar tags into the same cluster and is important for the success of collaborative tagging services. In this paper, Unsupervised Quick Reduct feature selection algorithm is applied to find a set of most commonly tagged bookmarks and then clustering techniques such as Soft rough fuzzy clustering and Rough K-Means algorithms are applied for clustering of user generated tags and the performance of these clustering approaches are illustrated in this paper.","PeriodicalId":123210,"journal":{"name":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPRIME.2013.6496724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Tagging is a popular way to annotate web 2.0 web sites. A tag is any user-generated word or phrase that helps to organize web 2.0 content. The current hype around web 2.0 applications, poses several important challenges for future data and web mining methods. An important challenge of Web 2.0 is the fact that a large amount of data has been generated over a short period. Clustering the tag data is very tedious since the tag space is very large in several social book marking web sites. So, instead of clustering the whole tag space of Web 2.0 data, some tags frequent enough in the tag space can be selected for clustering by applying feature selection techniques. The goal of feature selection is to determine a marginal bookmarked URL subset from a Web 2.0 data while retaining a suitably high accuracy in representing the original bookmarks. Tag clustering is the process of grouping similar tags into the same cluster and is important for the success of collaborative tagging services. In this paper, Unsupervised Quick Reduct feature selection algorithm is applied to find a set of most commonly tagged bookmarks and then clustering techniques such as Soft rough fuzzy clustering and Rough K-Means algorithms are applied for clustering of user generated tags and the performance of these clustering approaches are illustrated in this paper.

查看原文本刊更多论文

用于标记聚类的Web 2.0社交书签选择

标记是注释web 2.0网站的一种流行方法。标签是任何用户生成的有助于组织web 2.0内容的单词或短语。当前围绕web 2.0应用程序的炒作，对未来的数据和web挖掘方法提出了几个重要的挑战。Web 2.0的一个重要挑战是在短时间内生成了大量数据。在一些社会化书签网站中，由于标签空间非常大，因此标签数据聚类是非常繁琐的。因此，不必对Web 2.0数据的整个标记空间进行聚类，而是可以通过应用特征选择技术选择标记空间中足够频繁的一些标记进行聚类。特性选择的目标是从Web 2.0数据中确定边缘书签URL子集，同时在表示原始书签方面保持适当的高精度。标签聚类是将相似的标签分组到同一集群中的过程，对于协作标记服务的成功至关重要。本文采用无监督快速约简特征选择算法寻找一组最常标记的书签，然后采用软粗糙模糊聚类和粗糙K-Means算法等聚类技术对用户生成的标签进行聚类，并对这些聚类方法的性能进行了说明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering

自引率

0.00%

发文量