{"title":"基于文本域标签去噪的图像标注检索","authors":"Zachary Seymour, Zhongfei Zhang","doi":"10.1145/3206025.3206063","DOIUrl":null,"url":null,"abstract":"This work explores the problem of making user-generated text data, in the form of noisy tags, usable for tasks such as automatic image annotation and image retrieval by denoising the data. Earlier work in this area has focused on filtering out noisy, sparse, or incorrect tags by representing an image by the accumulation of the tags of its nearest neighbors in the visual space. However, this imposes an expensive preprocessing step that must be performed for each new set of images and tags and relies on assumptions about the way the images have been labelled that we find do not always hold. We instead propose a technique for calculating a set of probabilities for the relevance of each tag for a given image relying soley on information in the text domain, namely through widely-available pretrained continous word embeddings. By first clustering the word embeddings for the tags, we calculate a set of weights representing the probability that each tag is meaningful to the image content. Given the set of tags denoised in this way, we use kernel canonical correlation analysis (KCCA) to learn a semantic space which we can project into to retrieve relevant tags for unseen images or to retrieve images for unseen tags. This work also explores the deficiencies of the use of continuous word embeddings for automatic image annotation in the existing KCCA literature and introduces a new method for constructing textual kernel matrices using these word vectors that improves tag retrieval results for both user-generated tags as well as expert labels.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"201 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Image Annotation Retrieval with Text-Domain Label Denoising\",\"authors\":\"Zachary Seymour, Zhongfei Zhang\",\"doi\":\"10.1145/3206025.3206063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work explores the problem of making user-generated text data, in the form of noisy tags, usable for tasks such as automatic image annotation and image retrieval by denoising the data. Earlier work in this area has focused on filtering out noisy, sparse, or incorrect tags by representing an image by the accumulation of the tags of its nearest neighbors in the visual space. However, this imposes an expensive preprocessing step that must be performed for each new set of images and tags and relies on assumptions about the way the images have been labelled that we find do not always hold. We instead propose a technique for calculating a set of probabilities for the relevance of each tag for a given image relying soley on information in the text domain, namely through widely-available pretrained continous word embeddings. By first clustering the word embeddings for the tags, we calculate a set of weights representing the probability that each tag is meaningful to the image content. Given the set of tags denoised in this way, we use kernel canonical correlation analysis (KCCA) to learn a semantic space which we can project into to retrieve relevant tags for unseen images or to retrieve images for unseen tags. This work also explores the deficiencies of the use of continuous word embeddings for automatic image annotation in the existing KCCA literature and introduces a new method for constructing textual kernel matrices using these word vectors that improves tag retrieval results for both user-generated tags as well as expert labels.\",\"PeriodicalId\":224132,\"journal\":{\"name\":\"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval\",\"volume\":\"201 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3206025.3206063\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3206025.3206063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Image Annotation Retrieval with Text-Domain Label Denoising
This work explores the problem of making user-generated text data, in the form of noisy tags, usable for tasks such as automatic image annotation and image retrieval by denoising the data. Earlier work in this area has focused on filtering out noisy, sparse, or incorrect tags by representing an image by the accumulation of the tags of its nearest neighbors in the visual space. However, this imposes an expensive preprocessing step that must be performed for each new set of images and tags and relies on assumptions about the way the images have been labelled that we find do not always hold. We instead propose a technique for calculating a set of probabilities for the relevance of each tag for a given image relying soley on information in the text domain, namely through widely-available pretrained continous word embeddings. By first clustering the word embeddings for the tags, we calculate a set of weights representing the probability that each tag is meaningful to the image content. Given the set of tags denoised in this way, we use kernel canonical correlation analysis (KCCA) to learn a semantic space which we can project into to retrieve relevant tags for unseen images or to retrieve images for unseen tags. This work also explores the deficiencies of the use of continuous word embeddings for automatic image annotation in the existing KCCA literature and introduces a new method for constructing textual kernel matrices using these word vectors that improves tag retrieval results for both user-generated tags as well as expert labels.