{"title":"一种新的深度学习模型用于时尚行业图像说明的准确预测","authors":"Pulkit Dwivedi, Anushka Upadhyaya","doi":"10.1109/Confluence52989.2022.9734171","DOIUrl":null,"url":null,"abstract":"As the need for automation in the IT sector is growing, several fashion companies are employing models that can create appropriate descriptions for product images. This will assist buyers to better understand the goods, resulting in increased sales for the apparel company. For creating the image descriptions, the researchers used a variety of feature extraction approaches, including convolution neural networks with several layers like VGG-16 and VGG-19. Once the image features are extracted using these convolution neural network (CNN) models, processing of text data is done using a recurrent neural network (RNN) that represents the input sequence of text as a fixed length output vector. Finally, both the vector outputs obtained from the digital image and its description are combined to train the image caption generator model. In this work, we put forward a smaller 5 layer convolution neural network (CNN-5) and compared it with transfer learning models like VGG-16 and VGG-19. The experiments were carried out on the Fashion MNIST dataset, which consists 70,000 gray scale images of size of 28x28 pixels. Each image is linked to one of ten labels (0-9) that represent ten different fashion items. We compared the performance of the proposed methodology as well as the state-of-the-art models using Bilingual Evaluation Understudy: BLEU-I, BLEU-2, BLEU-3 and BLEU-4 scores. The research demonstrates that a smaller layered convolution neural network can reach a similar degree of accuracy for the Fashion MNIST dataset as compared to state-of-the-art methods.","PeriodicalId":261941,"journal":{"name":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A Novel Deep Learning Model for Accurate Prediction of Image Captions in Fashion Industry\",\"authors\":\"Pulkit Dwivedi, Anushka Upadhyaya\",\"doi\":\"10.1109/Confluence52989.2022.9734171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the need for automation in the IT sector is growing, several fashion companies are employing models that can create appropriate descriptions for product images. This will assist buyers to better understand the goods, resulting in increased sales for the apparel company. For creating the image descriptions, the researchers used a variety of feature extraction approaches, including convolution neural networks with several layers like VGG-16 and VGG-19. Once the image features are extracted using these convolution neural network (CNN) models, processing of text data is done using a recurrent neural network (RNN) that represents the input sequence of text as a fixed length output vector. Finally, both the vector outputs obtained from the digital image and its description are combined to train the image caption generator model. In this work, we put forward a smaller 5 layer convolution neural network (CNN-5) and compared it with transfer learning models like VGG-16 and VGG-19. The experiments were carried out on the Fashion MNIST dataset, which consists 70,000 gray scale images of size of 28x28 pixels. Each image is linked to one of ten labels (0-9) that represent ten different fashion items. We compared the performance of the proposed methodology as well as the state-of-the-art models using Bilingual Evaluation Understudy: BLEU-I, BLEU-2, BLEU-3 and BLEU-4 scores. The research demonstrates that a smaller layered convolution neural network can reach a similar degree of accuracy for the Fashion MNIST dataset as compared to state-of-the-art methods.\",\"PeriodicalId\":261941,\"journal\":{\"name\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/Confluence52989.2022.9734171\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Confluence52989.2022.9734171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Deep Learning Model for Accurate Prediction of Image Captions in Fashion Industry
As the need for automation in the IT sector is growing, several fashion companies are employing models that can create appropriate descriptions for product images. This will assist buyers to better understand the goods, resulting in increased sales for the apparel company. For creating the image descriptions, the researchers used a variety of feature extraction approaches, including convolution neural networks with several layers like VGG-16 and VGG-19. Once the image features are extracted using these convolution neural network (CNN) models, processing of text data is done using a recurrent neural network (RNN) that represents the input sequence of text as a fixed length output vector. Finally, both the vector outputs obtained from the digital image and its description are combined to train the image caption generator model. In this work, we put forward a smaller 5 layer convolution neural network (CNN-5) and compared it with transfer learning models like VGG-16 and VGG-19. The experiments were carried out on the Fashion MNIST dataset, which consists 70,000 gray scale images of size of 28x28 pixels. Each image is linked to one of ten labels (0-9) that represent ten different fashion items. We compared the performance of the proposed methodology as well as the state-of-the-art models using Bilingual Evaluation Understudy: BLEU-I, BLEU-2, BLEU-3 and BLEU-4 scores. The research demonstrates that a smaller layered convolution neural network can reach a similar degree of accuracy for the Fashion MNIST dataset as compared to state-of-the-art methods.