J. Simm, Adam Arany, Pooya Zakeri, Tom Haber, J. Wegner, V. Chupakhin, H. Ceulemans, Y. Moreau
{"title":"澳门:基于MCMC的高维边信息的可扩展贝叶斯分解","authors":"J. Simm, Adam Arany, Pooya Zakeri, Tom Haber, J. Wegner, V. Chupakhin, H. Ceulemans, Y. Moreau","doi":"10.1109/MLSP.2017.8168143","DOIUrl":null,"url":null,"abstract":"Bayesian matrix factorization is a method of choice for making predictions for large-scale incomplete matrices, due to availability of efficient Gibbs sampling schemes and its robustness to overfitting. In this paper, we consider factorization of large scale matrices with high-dimensional side information. However, sampling the link matrix for the side information with standard approaches costs O(F3) time, where F is the dimensionality of the features. To overcome this limitation we, firstly, propose a prior for the link matrix whose strength is proportional to the scale of latent variables. Secondly, using this prior we derive an efficient sampler, with linear complexity in the number of non-zeros, O(Nnz), by leveraging Krylov subspace methods, such as block conjugate gradient, allowing us to handle million-dimensional side information. We demonstrate the effectiveness of our proposed method in drug-protein interaction prediction task.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"11 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"Macau: Scalable Bayesian factorization with high-dimensional side information using MCMC\",\"authors\":\"J. Simm, Adam Arany, Pooya Zakeri, Tom Haber, J. Wegner, V. Chupakhin, H. Ceulemans, Y. Moreau\",\"doi\":\"10.1109/MLSP.2017.8168143\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bayesian matrix factorization is a method of choice for making predictions for large-scale incomplete matrices, due to availability of efficient Gibbs sampling schemes and its robustness to overfitting. In this paper, we consider factorization of large scale matrices with high-dimensional side information. However, sampling the link matrix for the side information with standard approaches costs O(F3) time, where F is the dimensionality of the features. To overcome this limitation we, firstly, propose a prior for the link matrix whose strength is proportional to the scale of latent variables. Secondly, using this prior we derive an efficient sampler, with linear complexity in the number of non-zeros, O(Nnz), by leveraging Krylov subspace methods, such as block conjugate gradient, allowing us to handle million-dimensional side information. We demonstrate the effectiveness of our proposed method in drug-protein interaction prediction task.\",\"PeriodicalId\":6542,\"journal\":{\"name\":\"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)\",\"volume\":\"11 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MLSP.2017.8168143\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2017.8168143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Macau: Scalable Bayesian factorization with high-dimensional side information using MCMC
Bayesian matrix factorization is a method of choice for making predictions for large-scale incomplete matrices, due to availability of efficient Gibbs sampling schemes and its robustness to overfitting. In this paper, we consider factorization of large scale matrices with high-dimensional side information. However, sampling the link matrix for the side information with standard approaches costs O(F3) time, where F is the dimensionality of the features. To overcome this limitation we, firstly, propose a prior for the link matrix whose strength is proportional to the scale of latent variables. Secondly, using this prior we derive an efficient sampler, with linear complexity in the number of non-zeros, O(Nnz), by leveraging Krylov subspace methods, such as block conjugate gradient, allowing us to handle million-dimensional side information. We demonstrate the effectiveness of our proposed method in drug-protein interaction prediction task.