{"title":"辅助视觉图像标题机器人","authors":"Prof.Anandkumar Birajdar","doi":"10.55041/ijsrem34573","DOIUrl":null,"url":null,"abstract":"It's challenging to automatically produce brief descriptions of an image's meaning because it can have diverse connotations in different languages. However, due to the vast amount of information packed into a single image, it is challenging to parse out the necessary context to use it to build sentences. It's a great way for the visually impaired to get around independently. This type of system can be built using the emerging programming technique of deep learning. This paper presents the development of an Image Caption Bot designed to aid individuals with visual impairments. We achieve enhanced accuracy in caption generation by modeling on the MSCOCO dataset using a Transformer encoder and Inception v3 for image processing. Image captioning, which entails generating textual descriptions for images, is the primary focus of our research. We achieve enhanced accuracy in caption generation by utilizing a Transformer encoder during training. The MSCOCO dataset serves as a valuable The results of the model are translated into speech for the benefit of the visually handicapped. Keywords—CNN, Google Text To Speech, MS-COCO, Inspection v3.","PeriodicalId":13661,"journal":{"name":"INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT","volume":"87 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Image Caption Bot for Assistive Vision\",\"authors\":\"Prof.Anandkumar Birajdar\",\"doi\":\"10.55041/ijsrem34573\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It's challenging to automatically produce brief descriptions of an image's meaning because it can have diverse connotations in different languages. However, due to the vast amount of information packed into a single image, it is challenging to parse out the necessary context to use it to build sentences. It's a great way for the visually impaired to get around independently. This type of system can be built using the emerging programming technique of deep learning. This paper presents the development of an Image Caption Bot designed to aid individuals with visual impairments. We achieve enhanced accuracy in caption generation by modeling on the MSCOCO dataset using a Transformer encoder and Inception v3 for image processing. Image captioning, which entails generating textual descriptions for images, is the primary focus of our research. We achieve enhanced accuracy in caption generation by utilizing a Transformer encoder during training. The MSCOCO dataset serves as a valuable The results of the model are translated into speech for the benefit of the visually handicapped. Keywords—CNN, Google Text To Speech, MS-COCO, Inspection v3.\",\"PeriodicalId\":13661,\"journal\":{\"name\":\"INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT\",\"volume\":\"87 6\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.55041/ijsrem34573\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.55041/ijsrem34573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
It's challenging to automatically produce brief descriptions of an image's meaning because it can have diverse connotations in different languages. However, due to the vast amount of information packed into a single image, it is challenging to parse out the necessary context to use it to build sentences. It's a great way for the visually impaired to get around independently. This type of system can be built using the emerging programming technique of deep learning. This paper presents the development of an Image Caption Bot designed to aid individuals with visual impairments. We achieve enhanced accuracy in caption generation by modeling on the MSCOCO dataset using a Transformer encoder and Inception v3 for image processing. Image captioning, which entails generating textual descriptions for images, is the primary focus of our research. We achieve enhanced accuracy in caption generation by utilizing a Transformer encoder during training. The MSCOCO dataset serves as a valuable The results of the model are translated into speech for the benefit of the visually handicapped. Keywords—CNN, Google Text To Speech, MS-COCO, Inspection v3.