基于生成对抗网络和面部地标检测的细粒度图像分类

Mahdieh Darvish, Mahsa Pouramini, H. Bahador
{"title":"基于生成对抗网络和面部地标检测的细粒度图像分类","authors":"Mahdieh Darvish, Mahsa Pouramini, H. Bahador","doi":"10.1109/MVIP53647.2022.9738759","DOIUrl":null,"url":null,"abstract":"Fine-grained classification remains a challenging task because distinguishing categories needs learning complex and local differences. Diversity in the pose, scale, and position of objects in an image makes the problem even more difficult. Although the recent Vision Transformer models achieve high performance, they need an extensive volume of input data. To encounter this problem, we made the best use of GAN-based data augmentation to generate extra dataset instances. Oxford-IIIT Pets was our dataset of choice for this experiment. It consists of 37 breeds of cats and dogs with variations in scale, poses, and lighting, which intensifies the difficulty of the classification task. Furthermore, we enhanced the performance of the recent Generative Adversarial Network (GAN), StyleGAN2-ADA model to generate more realistic images while preventing overfitting to the training set. We did this by training a customized version of MobileNetV2 to predict animal facial landmarks; then, we cropped images accordingly. Lastly, we combined the synthetic images with the original dataset and compared our proposed method with standard GANs augmentation and no augmentation with different subsets of training data. We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model. Code is available at: https://github.com/mahdi-darvish/GAN-augmented-pet-classifler","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Towards Fine-grained Image Classification with Generative Adversarial Networks and Facial Landmark Detection\",\"authors\":\"Mahdieh Darvish, Mahsa Pouramini, H. Bahador\",\"doi\":\"10.1109/MVIP53647.2022.9738759\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fine-grained classification remains a challenging task because distinguishing categories needs learning complex and local differences. Diversity in the pose, scale, and position of objects in an image makes the problem even more difficult. Although the recent Vision Transformer models achieve high performance, they need an extensive volume of input data. To encounter this problem, we made the best use of GAN-based data augmentation to generate extra dataset instances. Oxford-IIIT Pets was our dataset of choice for this experiment. It consists of 37 breeds of cats and dogs with variations in scale, poses, and lighting, which intensifies the difficulty of the classification task. Furthermore, we enhanced the performance of the recent Generative Adversarial Network (GAN), StyleGAN2-ADA model to generate more realistic images while preventing overfitting to the training set. We did this by training a customized version of MobileNetV2 to predict animal facial landmarks; then, we cropped images accordingly. Lastly, we combined the synthetic images with the original dataset and compared our proposed method with standard GANs augmentation and no augmentation with different subsets of training data. We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model. Code is available at: https://github.com/mahdi-darvish/GAN-augmented-pet-classifler\",\"PeriodicalId\":184716,\"journal\":{\"name\":\"2022 International Conference on Machine Vision and Image Processing (MVIP)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Machine Vision and Image Processing (MVIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MVIP53647.2022.9738759\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Machine Vision and Image Processing (MVIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MVIP53647.2022.9738759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

细粒度分类仍然是一项具有挑战性的任务,因为区分类别需要学习复杂的局部差异。图像中物体的姿态、比例和位置的多样性使问题变得更加困难。尽管最近的Vision Transformer模型实现了高性能,但它们需要大量的输入数据。为了解决这个问题,我们充分利用了基于gan的数据增强来生成额外的数据集实例。Oxford-IIIT Pets是我们选择的实验数据集。它由37个品种的猫和狗组成,在规模、姿势和光照方面都有所不同,这加大了分类任务的难度。此外,我们增强了最近的生成对抗网络(GAN), StyleGAN2-ADA模型的性能,以生成更逼真的图像,同时防止对训练集的过拟合。我们通过训练一个定制版本的MobileNetV2来预测动物的面部特征;然后,我们相应地裁剪图像。最后,我们将合成图像与原始数据集相结合,并对不同子集的训练数据与标准GANs增强和不增强进行了比较。我们通过评估最近的视觉变压器(Vision Transformer, ViT)模型上细粒度图像分类的准确性来验证我们的工作。代码可从https://github.com/mahdi-darvish/GAN-augmented-pet-classifler获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards Fine-grained Image Classification with Generative Adversarial Networks and Facial Landmark Detection
Fine-grained classification remains a challenging task because distinguishing categories needs learning complex and local differences. Diversity in the pose, scale, and position of objects in an image makes the problem even more difficult. Although the recent Vision Transformer models achieve high performance, they need an extensive volume of input data. To encounter this problem, we made the best use of GAN-based data augmentation to generate extra dataset instances. Oxford-IIIT Pets was our dataset of choice for this experiment. It consists of 37 breeds of cats and dogs with variations in scale, poses, and lighting, which intensifies the difficulty of the classification task. Furthermore, we enhanced the performance of the recent Generative Adversarial Network (GAN), StyleGAN2-ADA model to generate more realistic images while preventing overfitting to the training set. We did this by training a customized version of MobileNetV2 to predict animal facial landmarks; then, we cropped images accordingly. Lastly, we combined the synthetic images with the original dataset and compared our proposed method with standard GANs augmentation and no augmentation with different subsets of training data. We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model. Code is available at: https://github.com/mahdi-darvish/GAN-augmented-pet-classifler
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信