{"title":"BiModal latent dirichlet allocation for text and image","authors":"Xiaofeng Liao, Q. Jiang, Wei Zhang, Kai Zhang","doi":"10.1109/ICIST.2014.6920582","DOIUrl":null,"url":null,"abstract":"A BiModal Latent Dirichlet Allocation Model(BM-LDA) is proposed to learn a unified representation of data that comes from both the textual and visual modalities together. The model is able to form a unified representation that mixs both the textual and visual modalities. Based on the assumption, that the images and its surrounding text share a same topic, the model learns a posterior probability density in the space of latent variable of topics that bridging over the observed multi modality inputs. It maps the high dimensional space consist of the observed variables from both modalities to a low dimensional space of topcis. Experimental result on ImageCLEF data set, which consists of bi-modality data of images and surrounding text, shows our new BM-LDA model can get a fine representation for the multi-modality data, which is useful for tasks such as retrieval and classification.","PeriodicalId":306383,"journal":{"name":"2014 4th IEEE International Conference on Information Science and Technology","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th IEEE International Conference on Information Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIST.2014.6920582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
A BiModal Latent Dirichlet Allocation Model(BM-LDA) is proposed to learn a unified representation of data that comes from both the textual and visual modalities together. The model is able to form a unified representation that mixs both the textual and visual modalities. Based on the assumption, that the images and its surrounding text share a same topic, the model learns a posterior probability density in the space of latent variable of topics that bridging over the observed multi modality inputs. It maps the high dimensional space consist of the observed variables from both modalities to a low dimensional space of topcis. Experimental result on ImageCLEF data set, which consists of bi-modality data of images and surrounding text, shows our new BM-LDA model can get a fine representation for the multi-modality data, which is useful for tasks such as retrieval and classification.