{"title":"Modeling Document Networks with Tree-Averaged Copula Regularization","authors":"Yuan He, Cheng Wang, Changjun Jiang","doi":"10.1145/3018661.3018666","DOIUrl":null,"url":null,"abstract":"Document network is a kind of intriguing dataset which provides both topical (texts) and topological (links) information. Most previous work assumes that documents closely linked with each other share common topics. However, the associations among documents are usually complex, which are not limited to the homophily (i.e., tendency to link to similar others). Actually, the heterophily (i.e., tendency to link to different others) is another pervasive phenomenon in social networks. In this paper, we introduce a new tool, called copula, to separately model the documents and links, so that different copula functions can be applied to capture different correlation patterns. In statistics, a copula is a powerful framework for explicitly modeling the dependence of random variables by separating the marginals and their correlations. Though widely used in Economics, copulas have not been paid enough attention to by researchers in machine learning field. Besides, to further capture the potential associations among the unconnected documents, we apply the tree-averaged copula instead of a single copula function. This improvement makes our model achieve better expressive power, and also more elegant in algebra. We derive efficient EM algorithms to estimate the model parameters, and evaluate the performance of our model on three different datasets. Experimental results show that our approach achieves significant improvements on both topic and link modeling compared with the current state of the art.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3018661.3018666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Document network is a kind of intriguing dataset which provides both topical (texts) and topological (links) information. Most previous work assumes that documents closely linked with each other share common topics. However, the associations among documents are usually complex, which are not limited to the homophily (i.e., tendency to link to similar others). Actually, the heterophily (i.e., tendency to link to different others) is another pervasive phenomenon in social networks. In this paper, we introduce a new tool, called copula, to separately model the documents and links, so that different copula functions can be applied to capture different correlation patterns. In statistics, a copula is a powerful framework for explicitly modeling the dependence of random variables by separating the marginals and their correlations. Though widely used in Economics, copulas have not been paid enough attention to by researchers in machine learning field. Besides, to further capture the potential associations among the unconnected documents, we apply the tree-averaged copula instead of a single copula function. This improvement makes our model achieve better expressive power, and also more elegant in algebra. We derive efficient EM algorithms to estimate the model parameters, and evaluate the performance of our model on three different datasets. Experimental results show that our approach achieves significant improvements on both topic and link modeling compared with the current state of the art.