Towards Attribute-Controlled Fashion Image Captioning

IF 6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-06-05 DOI:10.1145/3671000

Chen Cai, Kim-Hui Yap, Suchen Wang

{"title":"Towards Attribute-Controlled Fashion Image Captioning","authors":"Chen Cai, Kim-Hui Yap, Suchen Wang","doi":"10.1145/3671000","DOIUrl":null,"url":null,"abstract":"<p>Fashion image captioning is a critical task in the fashion industry that aims to automatically generate product descriptions for fashion items. However, existing fashion image captioning models predict a fixed caption for a particular fashion item once deployed, which does not cater to unique preferences. We explore a controllable way of fashion image captioning that allows the users to specify a few semantic attributes to guide the caption generation. Our approach utilizes semantic attributes as a control signal, giving users the ability to specify particular fashion attributes (e.g., stitch, knit, sleeve, etc.) and styles (e.g., cool, classic, fresh, etc.) that they want the model to incorporate when generating captions. By providing this level of customization, our approach creates more personalized and targeted captions that suit individual preferences. To evaluate the effectiveness of our proposed approach, we clean, filter, and assemble a new fashion image caption dataset called FACAD170K from the current FACAD dataset. This dataset facilitates learning and enables us to investigate the effectiveness of our approach. Our results demonstrate that our proposed approach outperforms existing fashion image captioning models as well as conventional captioning methods. Besides, we further validate the effectiveness of the proposed method on the MSCOCO and Flickr30K captioning datasets and achieve competitive performance.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"43 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3671000","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Fashion image captioning is a critical task in the fashion industry that aims to automatically generate product descriptions for fashion items. However, existing fashion image captioning models predict a fixed caption for a particular fashion item once deployed, which does not cater to unique preferences. We explore a controllable way of fashion image captioning that allows the users to specify a few semantic attributes to guide the caption generation. Our approach utilizes semantic attributes as a control signal, giving users the ability to specify particular fashion attributes (e.g., stitch, knit, sleeve, etc.) and styles (e.g., cool, classic, fresh, etc.) that they want the model to incorporate when generating captions. By providing this level of customization, our approach creates more personalized and targeted captions that suit individual preferences. To evaluate the effectiveness of our proposed approach, we clean, filter, and assemble a new fashion image caption dataset called FACAD170K from the current FACAD dataset. This dataset facilitates learning and enables us to investigate the effectiveness of our approach. Our results demonstrate that our proposed approach outperforms existing fashion image captioning models as well as conventional captioning methods. Besides, we further validate the effectiveness of the proposed method on the MSCOCO and Flickr30K captioning datasets and achieve competitive performance.

查看原文本刊更多论文

实现受属性控制的时尚图像字幕制作

时尚图像标题是时尚行业的一项重要任务，旨在自动生成时尚产品的产品描述。然而，现有的时尚图像标题模型一旦部署，就会为特定的时尚产品预测一个固定的标题，无法满足独特的偏好。我们探索了一种可控的时尚图片标题生成方法，允许用户指定一些语义属性来指导标题生成。我们的方法利用语义属性作为控制信号，让用户能够指定他们希望模型在生成标题时纳入的特定时尚属性（如缝合、针织、袖子等）和风格（如酷、经典、清新等）。通过提供这种程度的定制，我们的方法可以根据个人喜好创建更加个性化和有针对性的标题。为了评估我们提出的方法的有效性，我们对当前的 FACAD 数据集进行了清理、过滤，并组装了一个名为 FACAD170K 的新的时尚图片标题数据集。该数据集为学习提供了便利，使我们能够研究我们方法的有效性。结果表明，我们提出的方法优于现有的时尚图像标题模型和传统标题方法。此外，我们还在 MSCOCO 和 Flickr30K 标题数据集上进一步验证了所提方法的有效性，并取得了具有竞争力的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.