{"title":"Finding Semantic Relationships in Folksonomies","authors":"Iman Saleh, Neamat El-Tazi","doi":"10.1109/WI.2018.00-92","DOIUrl":null,"url":null,"abstract":"In this paper we study the problem of finding semantic relationships between folksonomy tags. We investigate different methods used to embed tags in the vector space and find similarities between them using word embedding vectors. We also present two new methods for embedding tags in the vector space utilizing labeled Latent Dirichlet Allocation (LDA) and Wikipedia category links. Related tags are grouped into communities using an overlapping community detection technique. In order to evaluate tag embedding methods, we use three different evaluation metrics, two of them do not require a ground truth dataset and the third is based on a manually created dataset of ground truth communities. Our results show that representing folksonomy tags using bag of words and embedding this representation in the vector space yields the best results compared to embedding co-occurring tags only or embedding tags along with textual content of tagged documents. We also compare between using word embedding, Latent Semantic Indexing (LSI), and LDA to find similarities between bag of words representations of tags. We show that word embedding outperforms LSI in one representation, while LDA is hard to beat.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2018.00-92","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In this paper we study the problem of finding semantic relationships between folksonomy tags. We investigate different methods used to embed tags in the vector space and find similarities between them using word embedding vectors. We also present two new methods for embedding tags in the vector space utilizing labeled Latent Dirichlet Allocation (LDA) and Wikipedia category links. Related tags are grouped into communities using an overlapping community detection technique. In order to evaluate tag embedding methods, we use three different evaluation metrics, two of them do not require a ground truth dataset and the third is based on a manually created dataset of ground truth communities. Our results show that representing folksonomy tags using bag of words and embedding this representation in the vector space yields the best results compared to embedding co-occurring tags only or embedding tags along with textual content of tagged documents. We also compare between using word embedding, Latent Semantic Indexing (LSI), and LDA to find similarities between bag of words representations of tags. We show that word embedding outperforms LSI in one representation, while LDA is hard to beat.