{"title":"Generalization or Instantiation?: Estimating the Relative Abstractness between Images and Text","authors":"Qibin Zheng, Xiaoguang Ren, Yi Liu, Wei Qin","doi":"10.1145/3404555.3404610","DOIUrl":null,"url":null,"abstract":"Learning from multi-modal data is very often in current data mining and knowledge management applications. However, the information imbalance between modalities brings challenges for many multi-modal learning tasks, such as cross-modal retrieval, image captioning, and image synthesis. Understanding the cross-modal information gap is an important foundation for designing models and choosing the evaluating criteria of those applications. Especially for text and image data, existing researches have proposed the abstractness to measure the information imbalance. They evaluate the abstractness disparity by training a classifier using the manually annotated multi-modal sample pairs. However, these methods ignore the impact of the intra-modal relationship on the inter-modal abstractness; besides, the annotating process is very labor-intensive, and the quality cannot be guaranteed. In order to evaluate the text-image relationship more comprehensively and reduce the cost of evaluating, we propose the relative abstractness index (RAI) to measure the abstractness between multi-modal items, which measures the abstractness of a sample according to its certainty of differentiating the items of another modality. Besides, we proposed a cycled generating model to compute the RAI values between images and text. In contrast to existing works, the proposed index can better describe the image-text information disparity, and its computing process needs no annotated training samples.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3404555.3404610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Learning from multi-modal data is very often in current data mining and knowledge management applications. However, the information imbalance between modalities brings challenges for many multi-modal learning tasks, such as cross-modal retrieval, image captioning, and image synthesis. Understanding the cross-modal information gap is an important foundation for designing models and choosing the evaluating criteria of those applications. Especially for text and image data, existing researches have proposed the abstractness to measure the information imbalance. They evaluate the abstractness disparity by training a classifier using the manually annotated multi-modal sample pairs. However, these methods ignore the impact of the intra-modal relationship on the inter-modal abstractness; besides, the annotating process is very labor-intensive, and the quality cannot be guaranteed. In order to evaluate the text-image relationship more comprehensively and reduce the cost of evaluating, we propose the relative abstractness index (RAI) to measure the abstractness between multi-modal items, which measures the abstractness of a sample according to its certainty of differentiating the items of another modality. Besides, we proposed a cycled generating model to compute the RAI values between images and text. In contrast to existing works, the proposed index can better describe the image-text information disparity, and its computing process needs no annotated training samples.