Deep learning apporach for image captioning in Hindi language

2020 International Conference on Computer, Electrical & Communication Engineering (ICCECE) Pub Date : 2020-01-01 DOI:10.1109/ICCECE48148.2020.9223087

Ankit Rathi

{"title":"Deep learning apporach for image captioning in Hindi language","authors":"Ankit Rathi","doi":"10.1109/ICCECE48148.2020.9223087","DOIUrl":null,"url":null,"abstract":"Generating image description automatically from the content of an image is one of the fundamental problems in artificial intelligence. This task involves the knowledge of both computer vision and natural language processing, called “Image caption generation” Many researches have been carried out in this field, but it was mainly focused on generating image descriptions in English, as existing image caption datasets are mostly in English. However, the image captioning should not be restricted by language. The lack of image captioning dataset other than English is a problem, especially for a morphologically rich language such as Hindi. Thus, to tackle this problem, this research constructed Hindi image caption dataset based on images from Flickr8k dataset using Google cloud translator, which is called Flickr8k-Hindi Datasets The Flickr8k-Hindi Datasets consist of four datasets based on a number of descriptions per image and clean or unclean descriptions. This study also finds the best effective method to generate image description using the encoder-decoder neural network model. The experiments showed that training the model with a single clean description per image generates higher-quality caption than a model trained with five uncleaned descriptions per image. Although model trained with five uncleaned descriptions per image achieved BLEU-I score of 0.585, which is the current state of the art.","PeriodicalId":129001,"journal":{"name":"2020 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE48148.2020.9223087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Generating image description automatically from the content of an image is one of the fundamental problems in artificial intelligence. This task involves the knowledge of both computer vision and natural language processing, called “Image caption generation” Many researches have been carried out in this field, but it was mainly focused on generating image descriptions in English, as existing image caption datasets are mostly in English. However, the image captioning should not be restricted by language. The lack of image captioning dataset other than English is a problem, especially for a morphologically rich language such as Hindi. Thus, to tackle this problem, this research constructed Hindi image caption dataset based on images from Flickr8k dataset using Google cloud translator, which is called Flickr8k-Hindi Datasets The Flickr8k-Hindi Datasets consist of four datasets based on a number of descriptions per image and clean or unclean descriptions. This study also finds the best effective method to generate image description using the encoder-decoder neural network model. The experiments showed that training the model with a single clean description per image generates higher-quality caption than a model trained with five uncleaned descriptions per image. Although model trained with five uncleaned descriptions per image achieved BLEU-I score of 0.585, which is the current state of the art.

查看原文本刊更多论文

印地语图像字幕的深度学习方法

从图像内容中自动生成图像描述是人工智能的基本问题之一。这项任务涉及计算机视觉和自然语言处理的知识，称为“图像标题生成”。在这一领域已经进行了许多研究，但主要集中在生成英语的图像描述，因为现有的图像标题数据集大多是英语的。然而，图片说明不应该受到语言的限制。除了英语之外，缺乏图像字幕数据集是一个问题，特别是对于像印地语这样形态学丰富的语言。因此，为了解决这一问题，本研究使用谷歌云翻译基于来自Flickr8k数据集的图像构建了印度语图像标题数据集，称为Flickr8k-Hindi Datasets。Flickr8k-Hindi数据集由四个数据集组成，每个数据集基于多个描述和干净或不干净的描述。本研究还发现了利用编码器-解码器神经网络模型生成图像描述的最有效方法。实验表明，与每张图像使用五个未清理的描述训练的模型相比，使用单个干净描述训练的模型生成的标题质量更高。虽然使用每张图像5个未清理描述训练的模型获得了0.585的blue - i分数，这是目前的技术水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Computer, Electrical & Communication Engineering (ICCECE)

自引率

0.00%

发文量