从图像生成描述性文本 - 图像标题生成器

International Journal of Advanced Research in Science, Communication and Technology Pub Date : 2024-07-23 DOI:10.48175/ijarsct-19225

Avdhi Pagariya, Riddhi Jain

{"title":"从图像生成描述性文本 - 图像标题生成器","authors":"Avdhi Pagariya, Riddhi Jain","doi":"10.48175/ijarsct-19225","DOIUrl":null,"url":null,"abstract":"In the modern era, image captioning has become one of the most widely required tools. Moreover, there are inbuilt applications that generate and provide a caption for a certain image, all these things are done with the help of deep neural network models. The process of generating a description of an image is called image captioning. It requires recognizing the important objects, their attributes, and the relationships among the objects in an image. It generates syntactically and semantically correct sentences. In this paper, we present a deep learning model to describe images and generate captions using computer vision and machine translation. This paper aims to detect different objects found in an image, recognize the relationships between those objects and generate captions. The dataset used is Flickr8k and the programming language used was Python3, and an ML technique called Transfer Learning will be implemented with the help of the caption model, to demonstrate the proposed experiment. This paper will also elaborate on the functions and structure of the various Neural networks involved. Generating image captions is an important aspect of Computer Vision and Natural language processing. Image caption generators can find applications in Image segmentation as used by Facebook and Google Photos, and even more so, its use can be extended to video frames. They will easily automate the job of a person who has to interpret images. Not to mention it has immense scope in helping visually impaired people","PeriodicalId":472960,"journal":{"name":"International Journal of Advanced Research in Science, Communication and Technology","volume":"129 37","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generating Descriptive Text From Images - Image Caption Generator\",\"authors\":\"Avdhi Pagariya, Riddhi Jain\",\"doi\":\"10.48175/ijarsct-19225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the modern era, image captioning has become one of the most widely required tools. Moreover, there are inbuilt applications that generate and provide a caption for a certain image, all these things are done with the help of deep neural network models. The process of generating a description of an image is called image captioning. It requires recognizing the important objects, their attributes, and the relationships among the objects in an image. It generates syntactically and semantically correct sentences. In this paper, we present a deep learning model to describe images and generate captions using computer vision and machine translation. This paper aims to detect different objects found in an image, recognize the relationships between those objects and generate captions. The dataset used is Flickr8k and the programming language used was Python3, and an ML technique called Transfer Learning will be implemented with the help of the caption model, to demonstrate the proposed experiment. This paper will also elaborate on the functions and structure of the various Neural networks involved. Generating image captions is an important aspect of Computer Vision and Natural language processing. Image caption generators can find applications in Image segmentation as used by Facebook and Google Photos, and even more so, its use can be extended to video frames. They will easily automate the job of a person who has to interpret images. Not to mention it has immense scope in helping visually impaired people\",\"PeriodicalId\":472960,\"journal\":{\"name\":\"International Journal of Advanced Research in Science, Communication and Technology\",\"volume\":\"129 37\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advanced Research in Science, Communication and Technology\",\"FirstCategoryId\":\"0\",\"ListUrlMain\":\"https://doi.org/10.48175/ijarsct-19225\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Research in Science, Communication and Technology","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.48175/ijarsct-19225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在现代社会，图像标题已成为最广泛使用的工具之一。此外，还有一些内置应用程序可以生成并提供特定图像的标题，所有这些都是在深度神经网络模型的帮助下完成的。生成图像描述的过程称为图像标题。它需要识别图像中的重要对象、其属性以及对象之间的关系。它能生成语法和语义正确的句子。在本文中，我们提出了一种深度学习模型，利用计算机视觉和机器翻译来描述图像并生成标题。本文旨在检测图像中的不同对象，识别这些对象之间的关系并生成标题。本文使用的数据集是 Flickr8k，使用的编程语言是 Python3，并将在标题模型的帮助下实现一种名为 "迁移学习 "的 ML 技术，以演示所提议的实验。本文还将详细阐述所涉及的各种神经网络的功能和结构。生成图像标题是计算机视觉和自然语言处理的一个重要方面。图像标题生成器可应用于 Facebook 和 Google Photos 所使用的图像分割，甚至还可扩展到视频帧。图像标题生成器可以轻松实现图像解读工作的自动化。更不用说它在帮助视障人士方面的巨大潜力了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generating Descriptive Text From Images - Image Caption Generator

In the modern era, image captioning has become one of the most widely required tools. Moreover, there are inbuilt applications that generate and provide a caption for a certain image, all these things are done with the help of deep neural network models. The process of generating a description of an image is called image captioning. It requires recognizing the important objects, their attributes, and the relationships among the objects in an image. It generates syntactically and semantically correct sentences. In this paper, we present a deep learning model to describe images and generate captions using computer vision and machine translation. This paper aims to detect different objects found in an image, recognize the relationships between those objects and generate captions. The dataset used is Flickr8k and the programming language used was Python3, and an ML technique called Transfer Learning will be implemented with the help of the caption model, to demonstrate the proposed experiment. This paper will also elaborate on the functions and structure of the various Neural networks involved. Generating image captions is an important aspect of Computer Vision and Natural language processing. Image caption generators can find applications in Image segmentation as used by Facebook and Google Photos, and even more so, its use can be extended to video frames. They will easily automate the job of a person who has to interpret images. Not to mention it has immense scope in helping visually impaired people

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Advanced Research in Science, Communication and Technology

自引率

0.00%

发文量