{"title":"Fashion-Specific Ambiguous Expression Interpretation with Partial Visual-Semantic Embedding","authors":"Ryotaro Shimizu, Takuma Nakamura, M. Goto","doi":"10.1109/CVPRW59228.2023.00353","DOIUrl":null,"url":null,"abstract":"A novel technology named fashion intelligence system has been proposed to quantify ambiguous expressions unique to fashion, such as \"casual,\" \"adult-casual,\" and \"office-casual,\" and to support users’ understanding of fashion. However, the existing visual-semantic embedding (VSE) model, which is the basis of its system, does not support situations in which images are composed of multiple parts such as hair, tops, pants, skirts, and shoes. We propose partial VSE, which enables sensitive learning for each part of the fashion outfits. This enables five types of practical functionalities, particularly image-retrieval tasks in which changes are made only to the specified parts and image-reordering tasks that focus on the specified parts by the single model. Based on both the multiple unique qualitative and quantitative evaluation experiments, we show the effectiveness of the proposed model.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW59228.2023.00353","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A novel technology named fashion intelligence system has been proposed to quantify ambiguous expressions unique to fashion, such as "casual," "adult-casual," and "office-casual," and to support users’ understanding of fashion. However, the existing visual-semantic embedding (VSE) model, which is the basis of its system, does not support situations in which images are composed of multiple parts such as hair, tops, pants, skirts, and shoes. We propose partial VSE, which enables sensitive learning for each part of the fashion outfits. This enables five types of practical functionalities, particularly image-retrieval tasks in which changes are made only to the specified parts and image-reordering tasks that focus on the specified parts by the single model. Based on both the multiple unique qualitative and quantitative evaluation experiments, we show the effectiveness of the proposed model.