Determining the Number of Latent Factors in Statistical Multi-Relational Learning.

Journal of machine learning research : JMLR Pub Date : 2019-01-01

Chengchun Shi, Wenbin Lu, Rui Song

{"title":"Determining the Number of Latent Factors in Statistical Multi-Relational Learning.","authors":"Chengchun Shi, Wenbin Lu, Rui Song","doi":"","DOIUrl":null,"url":null,"abstract":"Statistical relational learning is primarily concerned with learning and inferring relationships between entities in large-scale knowledge graphs. Nickel et al. (2011) proposed a RESCAL tensor factorization model for statistical relational learning, which achieves better or at least comparable results on common benchmark data sets when compared to other state-of-the-art methods. Given a positive integer s, RESCAL computes an s-dimensional latent vector for each entity. The latent factors can be further used for solving relational learning tasks, such as collective classification, collective entity resolution and link-based clustering. The focus of this paper is to determine the number of latent factors in the RESCAL model. Due to the structure of the RESCAL model, its log-likelihood function is not concave. As a result, the corresponding maximum likelihood estimators (MLEs) may not be consistent. Nonetheless, we design a specific pseudometric, prove the consistency of the MLEs under this pseudometric and establish its rate of convergence. Based on these results, we propose a general class of information criteria and prove their model selection consistencies when the number of relations is either bounded or diverges at a proper rate of the number of entities. Simulations and real data examples show that our proposed information criteria have good finite sample properties.","PeriodicalId":314696,"journal":{"name":"Journal of machine learning research : JMLR","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6980192/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of machine learning research : JMLR","FirstCategoryId":"94","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Statistical relational learning is primarily concerned with learning and inferring relationships between entities in large-scale knowledge graphs. Nickel et al. (2011) proposed a RESCAL tensor factorization model for statistical relational learning, which achieves better or at least comparable results on common benchmark data sets when compared to other state-of-the-art methods. Given a positive integer s, RESCAL computes an s-dimensional latent vector for each entity. The latent factors can be further used for solving relational learning tasks, such as collective classification, collective entity resolution and link-based clustering. The focus of this paper is to determine the number of latent factors in the RESCAL model. Due to the structure of the RESCAL model, its log-likelihood function is not concave. As a result, the corresponding maximum likelihood estimators (MLEs) may not be consistent. Nonetheless, we design a specific pseudometric, prove the consistency of the MLEs under this pseudometric and establish its rate of convergence. Based on these results, we propose a general class of information criteria and prove their model selection consistencies when the number of relations is either bounded or diverges at a proper rate of the number of entities. Simulations and real data examples show that our proposed information criteria have good finite sample properties.

本刊更多论文

统计多关系学习中潜在因素数量的确定。

统计关系学习主要关注大规模知识图中实体之间的学习和推断关系。Nickel等人(2011)提出了一种用于统计关系学习的RESCAL张量分解模型，与其他最先进的方法相比，该模型在常见基准数据集上取得了更好的结果，或者至少具有可比性。给定一个正整数s, RESCAL为每个实体计算一个s维潜在向量。潜在因素可以进一步用于解决关系学习任务，如集体分类、集体实体解析和基于链接的聚类。本文的重点是确定RESCAL模型中潜在因素的数量。由于RESCAL模型的结构，其对数似然函数不是凹的。因此，相应的最大似然估计量(mle)可能不一致。然而，我们设计了一个特定的伪度量，证明了在这个伪度量下最大似然点的一致性，并确定了它的收敛速度。基于这些结果，我们提出了一类一般的信息准则，并证明了它们在关系数量有界或以实体数量的适当速率发散时的模型选择一致性。仿真和实际数据实例表明，所提出的信息准则具有良好的有限样本特性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of machine learning research : JMLR

自引率

0.00%

发文量