{"title":"基于图像高级语义属性的图像字幕与视觉问答语义关注网络","authors":"Angelin Gladston, D. Balaji","doi":"10.4018/ijbdia.313201","DOIUrl":null,"url":null,"abstract":"The main challenge in the vision-to-language system is generation of the caption with a proper meaningful answer for a question and extracting even the minute details from the image. The main contributions in this paper are presenting an approach based on image high-level semantic attributes and local image features address the challenges of V2L tasks. Especially, the high-level semantic attributes information is used to reduce the semantic gap between images and text. A novel semantic attention network is designed to explore the mapping relationships between semantic attributes and image regions. The semantic attention network highlights the concept-related regions and selects the region-related concepts. Two special V2L tasks, image captioning and VQA, are addressed by the proposed approach. Improved BLEU score shows the proposed image captioning performs well. The experimental results show that the proposed model is effective for V2L tasks.","PeriodicalId":398232,"journal":{"name":"Int. J. Big Data Intell. Appl.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Attention Network for Image Captioning and Visual Question Answering Based on Image High-Level Semantic Attributes\",\"authors\":\"Angelin Gladston, D. Balaji\",\"doi\":\"10.4018/ijbdia.313201\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main challenge in the vision-to-language system is generation of the caption with a proper meaningful answer for a question and extracting even the minute details from the image. The main contributions in this paper are presenting an approach based on image high-level semantic attributes and local image features address the challenges of V2L tasks. Especially, the high-level semantic attributes information is used to reduce the semantic gap between images and text. A novel semantic attention network is designed to explore the mapping relationships between semantic attributes and image regions. The semantic attention network highlights the concept-related regions and selects the region-related concepts. Two special V2L tasks, image captioning and VQA, are addressed by the proposed approach. Improved BLEU score shows the proposed image captioning performs well. The experimental results show that the proposed model is effective for V2L tasks.\",\"PeriodicalId\":398232,\"journal\":{\"name\":\"Int. J. Big Data Intell. Appl.\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Big Data Intell. Appl.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/ijbdia.313201\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Big Data Intell. Appl.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijbdia.313201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semantic Attention Network for Image Captioning and Visual Question Answering Based on Image High-Level Semantic Attributes
The main challenge in the vision-to-language system is generation of the caption with a proper meaningful answer for a question and extracting even the minute details from the image. The main contributions in this paper are presenting an approach based on image high-level semantic attributes and local image features address the challenges of V2L tasks. Especially, the high-level semantic attributes information is used to reduce the semantic gap between images and text. A novel semantic attention network is designed to explore the mapping relationships between semantic attributes and image regions. The semantic attention network highlights the concept-related regions and selects the region-related concepts. Two special V2L tasks, image captioning and VQA, are addressed by the proposed approach. Improved BLEU score shows the proposed image captioning performs well. The experimental results show that the proposed model is effective for V2L tasks.