{"title":"社会图像分类的多模态学习","authors":"Chunyang Liu, Xu Zhang, Xiong Li, Rui Li, Xiaoming Zhang, Wen-Han Chao","doi":"10.1109/FSKD.2016.7603345","DOIUrl":null,"url":null,"abstract":"There is growing interest in social image classification because of its importance in web-based image application. Though there are many approaches on image classification, it is a great problem to integrate multi-modal content of social images simultaneously for social image classification, since the textual content and visual content are represented in two heterogeneous feature spaces. In this study, we proposed a multi-modal learning algorithm to fuse the multiple features through their correlation seamlessly. Specifically, we learn two linear classification modules for the two types of feature, and then they are integrated by the l2 normalization via a joint model. With the joint model, the classification based on visual feature can be reinforced by the classification based on textual feature, and vice verse. Then, the test image can be classified based on both the textual feature and visual feature by combing the results of the two classifiers. The evaluate the approach, we conduct some experiments on real-world datasets, and the result shows the superiority of our proposed algorithm against the baselines.","PeriodicalId":373155,"journal":{"name":"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Multi-modal learning for social image classification\",\"authors\":\"Chunyang Liu, Xu Zhang, Xiong Li, Rui Li, Xiaoming Zhang, Wen-Han Chao\",\"doi\":\"10.1109/FSKD.2016.7603345\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is growing interest in social image classification because of its importance in web-based image application. Though there are many approaches on image classification, it is a great problem to integrate multi-modal content of social images simultaneously for social image classification, since the textual content and visual content are represented in two heterogeneous feature spaces. In this study, we proposed a multi-modal learning algorithm to fuse the multiple features through their correlation seamlessly. Specifically, we learn two linear classification modules for the two types of feature, and then they are integrated by the l2 normalization via a joint model. With the joint model, the classification based on visual feature can be reinforced by the classification based on textual feature, and vice verse. Then, the test image can be classified based on both the textual feature and visual feature by combing the results of the two classifiers. The evaluate the approach, we conduct some experiments on real-world datasets, and the result shows the superiority of our proposed algorithm against the baselines.\",\"PeriodicalId\":373155,\"journal\":{\"name\":\"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FSKD.2016.7603345\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2016.7603345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-modal learning for social image classification
There is growing interest in social image classification because of its importance in web-based image application. Though there are many approaches on image classification, it is a great problem to integrate multi-modal content of social images simultaneously for social image classification, since the textual content and visual content are represented in two heterogeneous feature spaces. In this study, we proposed a multi-modal learning algorithm to fuse the multiple features through their correlation seamlessly. Specifically, we learn two linear classification modules for the two types of feature, and then they are integrated by the l2 normalization via a joint model. With the joint model, the classification based on visual feature can be reinforced by the classification based on textual feature, and vice verse. Then, the test image can be classified based on both the textual feature and visual feature by combing the results of the two classifiers. The evaluate the approach, we conduct some experiments on real-world datasets, and the result shows the superiority of our proposed algorithm against the baselines.