Mohammad Saif Wajid, Hugo Terashima-Marin, Peyman Najafirad, Mohd Anas Wajid
{"title":"图像/视频字幕的深度学习和知识图谱:数据集、评估指标和方法综述","authors":"Mohammad Saif Wajid, Hugo Terashima-Marin, Peyman Najafirad, Mohd Anas Wajid","doi":"10.1002/eng2.12785","DOIUrl":null,"url":null,"abstract":"<p>Generating an image/video caption has always been a fundamental problem of Artificial Intelligence, which is usually performed using the potential of Deep Learning Methods, Computer Vision, Knowledge Graphs, and Natural Language Processing (NLP). The significant task of image/video captioning is to describe visual content in terms of natural language. Due to a semantic gap, this presents a massive problem in understanding and explaining images or videos syntactically and semantically. The current systems need somewhere to fill the gap between low-level and high-level features while mapping. Therefore, to tackle this problem, there is a need to describe the latest research and methods to overcome difficulties and to propose effective solutions. This work thoroughly analyses and investigates the most related methods (deep learning and knowledge graph-based approaches), benchmark datasets, and evaluation metrics with their benefits and limitations. Here we have also reviewed the state-of-the-art methods related to image/video captioning and their applications in the current scenario. Finally, we provide thorough information on existing research with comparisons of results on benchmark datasets. We have also mentioned the existing challenges and future direction of research.</p>","PeriodicalId":72922,"journal":{"name":"Engineering reports : open access","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/eng2.12785","citationCount":"0","resultStr":"{\"title\":\"Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods\",\"authors\":\"Mohammad Saif Wajid, Hugo Terashima-Marin, Peyman Najafirad, Mohd Anas Wajid\",\"doi\":\"10.1002/eng2.12785\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Generating an image/video caption has always been a fundamental problem of Artificial Intelligence, which is usually performed using the potential of Deep Learning Methods, Computer Vision, Knowledge Graphs, and Natural Language Processing (NLP). The significant task of image/video captioning is to describe visual content in terms of natural language. Due to a semantic gap, this presents a massive problem in understanding and explaining images or videos syntactically and semantically. The current systems need somewhere to fill the gap between low-level and high-level features while mapping. Therefore, to tackle this problem, there is a need to describe the latest research and methods to overcome difficulties and to propose effective solutions. This work thoroughly analyses and investigates the most related methods (deep learning and knowledge graph-based approaches), benchmark datasets, and evaluation metrics with their benefits and limitations. Here we have also reviewed the state-of-the-art methods related to image/video captioning and their applications in the current scenario. Finally, we provide thorough information on existing research with comparisons of results on benchmark datasets. We have also mentioned the existing challenges and future direction of research.</p>\",\"PeriodicalId\":72922,\"journal\":{\"name\":\"Engineering reports : open access\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/eng2.12785\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering reports : open access\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/eng2.12785\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering reports : open access","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/eng2.12785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods
Generating an image/video caption has always been a fundamental problem of Artificial Intelligence, which is usually performed using the potential of Deep Learning Methods, Computer Vision, Knowledge Graphs, and Natural Language Processing (NLP). The significant task of image/video captioning is to describe visual content in terms of natural language. Due to a semantic gap, this presents a massive problem in understanding and explaining images or videos syntactically and semantically. The current systems need somewhere to fill the gap between low-level and high-level features while mapping. Therefore, to tackle this problem, there is a need to describe the latest research and methods to overcome difficulties and to propose effective solutions. This work thoroughly analyses and investigates the most related methods (deep learning and knowledge graph-based approaches), benchmark datasets, and evaluation metrics with their benefits and limitations. Here we have also reviewed the state-of-the-art methods related to image/video captioning and their applications in the current scenario. Finally, we provide thorough information on existing research with comparisons of results on benchmark datasets. We have also mentioned the existing challenges and future direction of research.