从图像生成描述性文本 - 图像标题生成器

Avdhi Pagariya, Riddhi Jain
{"title":"从图像生成描述性文本 - 图像标题生成器","authors":"Avdhi Pagariya, Riddhi Jain","doi":"10.48175/ijarsct-19225","DOIUrl":null,"url":null,"abstract":"In the modern era, image captioning has become one of the most widely required tools. Moreover, there are inbuilt applications that generate and provide a caption for a certain image, all these things are done with the help of deep neural network models. The process of generating a description of an image is called image captioning. It requires recognizing the important objects, their attributes, and the relationships among the objects in an image. It generates syntactically and semantically correct sentences. In this paper, we present a deep learning model to describe images and generate captions using computer vision and machine translation. This paper aims to detect different objects found in an image, recognize the relationships between those objects and generate captions. The dataset used is Flickr8k and the programming language used was Python3, and an ML technique called Transfer Learning will be implemented with the help of the caption model, to demonstrate the proposed experiment. This paper will also elaborate on the functions and structure of the various Neural networks involved. Generating image captions is an important aspect of Computer Vision and Natural language processing. Image caption generators can find applications in Image segmentation as used by Facebook and Google Photos, and even more so, its use can be extended to video frames. They will easily automate the job of a person who has to interpret images. Not to mention it has immense scope in helping visually impaired people","PeriodicalId":472960,"journal":{"name":"International Journal of Advanced Research in Science, Communication and Technology","volume":"129 37","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generating Descriptive Text From Images - Image Caption Generator\",\"authors\":\"Avdhi Pagariya, Riddhi Jain\",\"doi\":\"10.48175/ijarsct-19225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the modern era, image captioning has become one of the most widely required tools. Moreover, there are inbuilt applications that generate and provide a caption for a certain image, all these things are done with the help of deep neural network models. The process of generating a description of an image is called image captioning. It requires recognizing the important objects, their attributes, and the relationships among the objects in an image. It generates syntactically and semantically correct sentences. In this paper, we present a deep learning model to describe images and generate captions using computer vision and machine translation. This paper aims to detect different objects found in an image, recognize the relationships between those objects and generate captions. The dataset used is Flickr8k and the programming language used was Python3, and an ML technique called Transfer Learning will be implemented with the help of the caption model, to demonstrate the proposed experiment. This paper will also elaborate on the functions and structure of the various Neural networks involved. Generating image captions is an important aspect of Computer Vision and Natural language processing. Image caption generators can find applications in Image segmentation as used by Facebook and Google Photos, and even more so, its use can be extended to video frames. They will easily automate the job of a person who has to interpret images. Not to mention it has immense scope in helping visually impaired people\",\"PeriodicalId\":472960,\"journal\":{\"name\":\"International Journal of Advanced Research in Science, Communication and Technology\",\"volume\":\"129 37\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advanced Research in Science, Communication and Technology\",\"FirstCategoryId\":\"0\",\"ListUrlMain\":\"https://doi.org/10.48175/ijarsct-19225\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Research in Science, Communication and Technology","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.48175/ijarsct-19225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在现代社会,图像标题已成为最广泛使用的工具之一。此外,还有一些内置应用程序可以生成并提供特定图像的标题,所有这些都是在深度神经网络模型的帮助下完成的。生成图像描述的过程称为图像标题。它需要识别图像中的重要对象、其属性以及对象之间的关系。它能生成语法和语义正确的句子。在本文中,我们提出了一种深度学习模型,利用计算机视觉和机器翻译来描述图像并生成标题。本文旨在检测图像中的不同对象,识别这些对象之间的关系并生成标题。本文使用的数据集是 Flickr8k,使用的编程语言是 Python3,并将在标题模型的帮助下实现一种名为 "迁移学习 "的 ML 技术,以演示所提议的实验。本文还将详细阐述所涉及的各种神经网络的功能和结构。生成图像标题是计算机视觉和自然语言处理的一个重要方面。图像标题生成器可应用于 Facebook 和 Google Photos 所使用的图像分割,甚至还可扩展到视频帧。图像标题生成器可以轻松实现图像解读工作的自动化。更不用说它在帮助视障人士方面的巨大潜力了。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Generating Descriptive Text From Images - Image Caption Generator
In the modern era, image captioning has become one of the most widely required tools. Moreover, there are inbuilt applications that generate and provide a caption for a certain image, all these things are done with the help of deep neural network models. The process of generating a description of an image is called image captioning. It requires recognizing the important objects, their attributes, and the relationships among the objects in an image. It generates syntactically and semantically correct sentences. In this paper, we present a deep learning model to describe images and generate captions using computer vision and machine translation. This paper aims to detect different objects found in an image, recognize the relationships between those objects and generate captions. The dataset used is Flickr8k and the programming language used was Python3, and an ML technique called Transfer Learning will be implemented with the help of the caption model, to demonstrate the proposed experiment. This paper will also elaborate on the functions and structure of the various Neural networks involved. Generating image captions is an important aspect of Computer Vision and Natural language processing. Image caption generators can find applications in Image segmentation as used by Facebook and Google Photos, and even more so, its use can be extended to video frames. They will easily automate the job of a person who has to interpret images. Not to mention it has immense scope in helping visually impaired people
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信