{"title":"A user-oriented semi-supervised probabilistic topic model","authors":"Jing Li, Yongbin Qin, Ruizhang Huang","doi":"10.1109/COMPCOMM.2016.7924706","DOIUrl":null,"url":null,"abstract":"Topic modeling has been widely used to mine topics. However, users' individual needs are seldom considered, which is against the trend that individuation becomes more and more important. In this work, we propose a user-oriented probabilistic topic model based on Latent Dirichlet Allocation. Interested and uninterested words are used as supervised information to take users' preferences into account. A self-learning algorithm increasing the quantity of supervised information effectively are also presented. As a semi-supervised model, data with or without supervised information attached are treated differently. In the parameters inference, we integrate the Pólya urn model into the Gibbs sampling process to utilize different kinds of supervised information efficiently. Experiments conducted on real datasets show the model outperforms the state-of-the-art models.","PeriodicalId":210833,"journal":{"name":"2016 2nd IEEE International Conference on Computer and Communications (ICCC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd IEEE International Conference on Computer and Communications (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPCOMM.2016.7924706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Topic modeling has been widely used to mine topics. However, users' individual needs are seldom considered, which is against the trend that individuation becomes more and more important. In this work, we propose a user-oriented probabilistic topic model based on Latent Dirichlet Allocation. Interested and uninterested words are used as supervised information to take users' preferences into account. A self-learning algorithm increasing the quantity of supervised information effectively are also presented. As a semi-supervised model, data with or without supervised information attached are treated differently. In the parameters inference, we integrate the Pólya urn model into the Gibbs sampling process to utilize different kinds of supervised information efficiently. Experiments conducted on real datasets show the model outperforms the state-of-the-art models.