{"title":"A Multimodal Multimedia Retrieval Model Based on pLSA","authors":"Yu Zhang, Ye Yuan, Guoren Wang","doi":"10.1109/WISA.2014.14","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a multimodal multimedia retrieval model based on probabilistic Latent Semantic analysis (pLSA) to achieve multimodal retrieval. Firstly, We employ pLSA, to respectively simulate the generative processes of texts and images in the same documents. Then we employ the multivariate linear regression method to analyze the correlation between representations of texts and images and use the ordinary least squares (OLS) method to obtain the estimation of the regression matrix that can be used to transform between textual and visual modal data. Extensive experiments results demonstrate the effectiveness and efficiency of the proposed model.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th Web Information System and Application Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WISA.2014.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we propose a multimodal multimedia retrieval model based on probabilistic Latent Semantic analysis (pLSA) to achieve multimodal retrieval. Firstly, We employ pLSA, to respectively simulate the generative processes of texts and images in the same documents. Then we employ the multivariate linear regression method to analyze the correlation between representations of texts and images and use the ordinary least squares (OLS) method to obtain the estimation of the regression matrix that can be used to transform between textual and visual modal data. Extensive experiments results demonstrate the effectiveness and efficiency of the proposed model.