Laila Bashmal, Yakoub Bazi, Farid Melgani, Mohamad M. Al Rahhal, Mansour Abdulaziz Al Zuair
{"title":"Language Integration in Remote Sensing: Tasks, datasets, and future directions","authors":"Laila Bashmal, Yakoub Bazi, Farid Melgani, Mohamad M. Al Rahhal, Mansour Abdulaziz Al Zuair","doi":"10.1109/mgrs.2023.3316438","DOIUrl":null,"url":null,"abstract":"The emerging field of vision–language models, which combines computer vision and natural language processing (NLP), has gained significant interest and exploration. This integration has opened up new research opportunities, particularly in remote sensing (RS), where it has the potential to enhance RS systems’ capabilities. In this context, this article presents a comprehensive review of more than 100 articles focusing on the integration of NLP techniques into RS understanding research. The review covers various vision–language modeling tasks, including but not limited to RS image captioning, RS text-to-image retrieval, RS visual question answering (VQA), and RS image generation. For each task, the review provides a summary of the state-of-the-art developments, including methods, evaluation metrics, datasets, and experimental results on benchmark datasets. The review is concluded by discussing the key challenges and highlighting potential research directions for future development, with the aim of inspiring further research in this important field.","PeriodicalId":48660,"journal":{"name":"IEEE Geoscience and Remote Sensing Magazine","volume":"20 1","pages":"0"},"PeriodicalIF":16.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Geoscience and Remote Sensing Magazine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mgrs.2023.3316438","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0
Abstract
The emerging field of vision–language models, which combines computer vision and natural language processing (NLP), has gained significant interest and exploration. This integration has opened up new research opportunities, particularly in remote sensing (RS), where it has the potential to enhance RS systems’ capabilities. In this context, this article presents a comprehensive review of more than 100 articles focusing on the integration of NLP techniques into RS understanding research. The review covers various vision–language modeling tasks, including but not limited to RS image captioning, RS text-to-image retrieval, RS visual question answering (VQA), and RS image generation. For each task, the review provides a summary of the state-of-the-art developments, including methods, evaluation metrics, datasets, and experimental results on benchmark datasets. The review is concluded by discussing the key challenges and highlighting potential research directions for future development, with the aim of inspiring further research in this important field.
期刊介绍:
The IEEE Geoscience and Remote Sensing Magazine (GRSM) serves as an informative platform, keeping readers abreast of activities within the IEEE GRS Society, its technical committees, and chapters. In addition to updating readers on society-related news, GRSM plays a crucial role in educating and informing its audience through various channels. These include:Technical Papers,International Remote Sensing Activities,Contributions on Education Activities,Industrial and University Profiles,Conference News,Book Reviews,Calendar of Important Events.