Rintaro Yanagi, Ren Togo, Takahiro Ogawa, M. Haseyama
{"title":"Interactive re-ranking for cross-modal retrieval based on object-wise question answering","authors":"Rintaro Yanagi, Ren Togo, Takahiro Ogawa, M. Haseyama","doi":"10.1145/3444685.3446290","DOIUrl":null,"url":null,"abstract":"Cross-modal retrieval methods retrieve desired images from a query text by learning relationships between texts and images. This retrieval approach is one of the most effective ways in the easiness of query preparation. Recent cross-modal retrieval is convenient and accurate when users input a query text that can uniquely identify the desired image. Meanwhile, users frequently input ambiguous query texts, and these ambiguous queries make it difficult to obtain the desired images. To alleviate these difficulties, in this paper, we propose a novel interactive cross-modal retrieval method based on question answering (QA) with users. The proposed method analyses candidate images and asks users about information that can narrow retrieval candidates effectively. By only answering the questions generated by the proposed method, users can reach their desired images even from an ambiguous query text. Experimental results show the effectiveness of the proposed method.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3444685.3446290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Cross-modal retrieval methods retrieve desired images from a query text by learning relationships between texts and images. This retrieval approach is one of the most effective ways in the easiness of query preparation. Recent cross-modal retrieval is convenient and accurate when users input a query text that can uniquely identify the desired image. Meanwhile, users frequently input ambiguous query texts, and these ambiguous queries make it difficult to obtain the desired images. To alleviate these difficulties, in this paper, we propose a novel interactive cross-modal retrieval method based on question answering (QA) with users. The proposed method analyses candidate images and asks users about information that can narrow retrieval candidates effectively. By only answering the questions generated by the proposed method, users can reach their desired images even from an ambiguous query text. Experimental results show the effectiveness of the proposed method.