Vaibhav Thalanki, R. N. Akshayaa, R. Krithika, R. Jothi
{"title":"Voice-based Image Captioning using Inception-V3 Transfer Learning Model","authors":"Vaibhav Thalanki, R. N. Akshayaa, R. Krithika, R. Jothi","doi":"10.1109/ICOEI56765.2023.10125754","DOIUrl":null,"url":null,"abstract":"This study presents a deep learning model to serve as an image caption generator that generates descriptions or captions of the images in proper natural language sentences, which will then be read aloud by the text to speech translator. With the growing demand for tools like this in various fields such as assisting the visually impaired, self-driving vehicles, and virtual assistants. Hence, the development of such systems has become increasingly important. The proposed system utilizes a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) with attention models, specifically by using the Inception V3 model and a variant of RNN called Gated Recurrent Units (GRU).","PeriodicalId":168942,"journal":{"name":"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOEI56765.2023.10125754","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study presents a deep learning model to serve as an image caption generator that generates descriptions or captions of the images in proper natural language sentences, which will then be read aloud by the text to speech translator. With the growing demand for tools like this in various fields such as assisting the visually impaired, self-driving vehicles, and virtual assistants. Hence, the development of such systems has become increasingly important. The proposed system utilizes a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) with attention models, specifically by using the Inception V3 model and a variant of RNN called Gated Recurrent Units (GRU).