{"title":"Channel and spatial attention mechanism for fashion image captioning","authors":"Bao T. Nguyen, S. T. Nguyen, Anh H. Vo","doi":"10.11591/ijece.v13i5.pp5833-5842","DOIUrl":null,"url":null,"abstract":"Image captioning aims to automatically generate one or more description sentences for a given input image. Most of the existing captioning methods use encoder-decoder model which mainly focus on recognizing and capturing the relationship between objects appearing in the input image. However, when generating captions for fashion images, it is important to not only describe the items and their relationships, but also mention attribute features of clothes (shape, texture, style, fabric, and more). In this study, one novel model is proposed for fashion image captioning task which can capture not only the items and their relationship, but also their attribute features. Two different attention mechanisms (spatial-attention and channel-wise attention) is incorporated to the traditional encoder-decoder model, which dynamically interprets the caption sentence in multi-layer feature map in addition to the depth dimension of the feature map. We evaluate our proposed architecture on Fashion-Gen using three different metrics (CIDEr, ROUGE-L, and BLEU-1), and achieve the scores of 89.7, 50.6 and 45.6, respectively. Based on experiments, our proposed method shows significant performance improvement for the task of fashion-image captioning, and outperforms other state-of-the-art image captioning methods.","PeriodicalId":38060,"journal":{"name":"International Journal of Electrical and Computer Engineering","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Electrical and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijece.v13i5.pp5833-5842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
Image captioning aims to automatically generate one or more description sentences for a given input image. Most of the existing captioning methods use encoder-decoder model which mainly focus on recognizing and capturing the relationship between objects appearing in the input image. However, when generating captions for fashion images, it is important to not only describe the items and their relationships, but also mention attribute features of clothes (shape, texture, style, fabric, and more). In this study, one novel model is proposed for fashion image captioning task which can capture not only the items and their relationship, but also their attribute features. Two different attention mechanisms (spatial-attention and channel-wise attention) is incorporated to the traditional encoder-decoder model, which dynamically interprets the caption sentence in multi-layer feature map in addition to the depth dimension of the feature map. We evaluate our proposed architecture on Fashion-Gen using three different metrics (CIDEr, ROUGE-L, and BLEU-1), and achieve the scores of 89.7, 50.6 and 45.6, respectively. Based on experiments, our proposed method shows significant performance improvement for the task of fashion-image captioning, and outperforms other state-of-the-art image captioning methods.
期刊介绍:
International Journal of Electrical and Computer Engineering (IJECE) is the official publication of the Institute of Advanced Engineering and Science (IAES). The journal is open to submission from scholars and experts in the wide areas of electrical, electronics, instrumentation, control, telecommunication and computer engineering from the global world. The journal publishes original papers in the field of electrical, computer and informatics engineering which covers, but not limited to, the following scope: -Electronics: Electronic Materials, Microelectronic System, Design and Implementation of Application Specific Integrated Circuits (ASIC), VLSI Design, System-on-a-Chip (SoC) and Electronic Instrumentation Using CAD Tools, digital signal & data Processing, , Biomedical Transducers and instrumentation, Medical Imaging Equipment and Techniques, Biomedical Imaging and Image Processing, Biomechanics and Rehabilitation Engineering, Biomaterials and Drug Delivery Systems; -Electrical: Electrical Engineering Materials, Electric Power Generation, Transmission and Distribution, Power Electronics, Power Quality, Power Economic, FACTS, Renewable Energy, Electric Traction, Electromagnetic Compatibility, High Voltage Insulation Technologies, High Voltage Apparatuses, Lightning Detection and Protection, Power System Analysis, SCADA, Electrical Measurements; -Telecommunication: Modulation and Signal Processing for Telecommunication, Information Theory and Coding, Antenna and Wave Propagation, Wireless and Mobile Communications, Radio Communication, Communication Electronics and Microwave, Radar Imaging, Distributed Platform, Communication Network and Systems, Telematics Services and Security Network; -Control[...] -Computer and Informatics[...]