Vaibhav Thalanki, R. N. Akshayaa, R. Krithika, R. Jothi
{"title":"使用Inception-V3迁移学习模型的基于语音的图像字幕","authors":"Vaibhav Thalanki, R. N. Akshayaa, R. Krithika, R. Jothi","doi":"10.1109/ICOEI56765.2023.10125754","DOIUrl":null,"url":null,"abstract":"This study presents a deep learning model to serve as an image caption generator that generates descriptions or captions of the images in proper natural language sentences, which will then be read aloud by the text to speech translator. With the growing demand for tools like this in various fields such as assisting the visually impaired, self-driving vehicles, and virtual assistants. Hence, the development of such systems has become increasingly important. The proposed system utilizes a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) with attention models, specifically by using the Inception V3 model and a variant of RNN called Gated Recurrent Units (GRU).","PeriodicalId":168942,"journal":{"name":"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Voice-based Image Captioning using Inception-V3 Transfer Learning Model\",\"authors\":\"Vaibhav Thalanki, R. N. Akshayaa, R. Krithika, R. Jothi\",\"doi\":\"10.1109/ICOEI56765.2023.10125754\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study presents a deep learning model to serve as an image caption generator that generates descriptions or captions of the images in proper natural language sentences, which will then be read aloud by the text to speech translator. With the growing demand for tools like this in various fields such as assisting the visually impaired, self-driving vehicles, and virtual assistants. Hence, the development of such systems has become increasingly important. The proposed system utilizes a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) with attention models, specifically by using the Inception V3 model and a variant of RNN called Gated Recurrent Units (GRU).\",\"PeriodicalId\":168942,\"journal\":{\"name\":\"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOEI56765.2023.10125754\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOEI56765.2023.10125754","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Voice-based Image Captioning using Inception-V3 Transfer Learning Model
This study presents a deep learning model to serve as an image caption generator that generates descriptions or captions of the images in proper natural language sentences, which will then be read aloud by the text to speech translator. With the growing demand for tools like this in various fields such as assisting the visually impaired, self-driving vehicles, and virtual assistants. Hence, the development of such systems has become increasingly important. The proposed system utilizes a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) with attention models, specifically by using the Inception V3 model and a variant of RNN called Gated Recurrent Units (GRU).