Anbara Z Al-Jamal, Maryam J Bani-Amer, Shadi A. Aljawarneh
{"title":"Image Captioning Techniques: A Review","authors":"Anbara Z Al-Jamal, Maryam J Bani-Amer, Shadi A. Aljawarneh","doi":"10.1109/ICEMIS56295.2022.9914173","DOIUrl":null,"url":null,"abstract":"Image captioning is the process of generating accurate and descriptive captions. As a recently emerged research area, it is attracting more and more attention. To achieve the goal of image captioning, semantic information of images needs to be captured and expressed in natural languages. Image captions need to identify objects, actions, their relationships, and some salient features that may be missing from the image. After identification, the next step is to generate the most relevant and concise image description. This should be syntactically and semantically correct. Deep learning techniques can handle this process with CNNs and LSTMs. In this survey paper, we first talk about techniques used in early work that are mainly retrieval and template-based. Then, we focus on neural network-based techniques, which offer contemporary results. These techniques are in addition divided into subcategories based on the specific framework. Each subcategory is discussed in detail. After that, state-of-the-art methods are compared on benchmark datasets. Following that, discussions on future research approaches are presented.","PeriodicalId":191284,"journal":{"name":"2022 International Conference on Engineering & MIS (ICEMIS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Engineering & MIS (ICEMIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEMIS56295.2022.9914173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Image captioning is the process of generating accurate and descriptive captions. As a recently emerged research area, it is attracting more and more attention. To achieve the goal of image captioning, semantic information of images needs to be captured and expressed in natural languages. Image captions need to identify objects, actions, their relationships, and some salient features that may be missing from the image. After identification, the next step is to generate the most relevant and concise image description. This should be syntactically and semantically correct. Deep learning techniques can handle this process with CNNs and LSTMs. In this survey paper, we first talk about techniques used in early work that are mainly retrieval and template-based. Then, we focus on neural network-based techniques, which offer contemporary results. These techniques are in addition divided into subcategories based on the specific framework. Each subcategory is discussed in detail. After that, state-of-the-art methods are compared on benchmark datasets. Following that, discussions on future research approaches are presented.