{"title":"Transforming Earth Observation: An Extensive Evaluation of Vision Transformers for Satellite Images-Based Land Cover Classification","authors":"Fakhri Alam Khan","doi":"10.1111/exsy.70082","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Satellite imagery offers rich information for land cover classification, but choosing an effective yet efficient feature extractor or backbone architecture remains challenging. In this study, I benchmark 25 vision-transformers across 10 public land cover datasets to guide backbone selection for downstream classification tasks. The proposed approach encodes each satellite image into a fixed-length feature vector via a pre-trained transformer, then trains and tests a linear support-vector classifier on these encodings to isolate the impact of the backbone alone. I report average classification accuracy and F1-score over three random stratified splits per dataset, and I also measure training time to assess the computational cost. Results show that the image encoding performed using large-receptive-field transformers with advanced self-attention—particularly <span>deit3_base_patch16_224</span> and <span>twins_svt_large</span>—achieve the highest accuracies without incurring prohibitive training times. In contrast, encodings of the compact variants achieve faster training but incur notable performance drops around 7%–8%. These findings reveal a clear trade-off between representational power and efficiency. Practitioners can leverage such rankings to select a transformer backbone that best balances accuracy and computational efficiency for satellite image-based land cover classification tasks, accelerating the development of robust and resource-aware systems.</p>\n </div>","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"42 7","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70082","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Satellite imagery offers rich information for land cover classification, but choosing an effective yet efficient feature extractor or backbone architecture remains challenging. In this study, I benchmark 25 vision-transformers across 10 public land cover datasets to guide backbone selection for downstream classification tasks. The proposed approach encodes each satellite image into a fixed-length feature vector via a pre-trained transformer, then trains and tests a linear support-vector classifier on these encodings to isolate the impact of the backbone alone. I report average classification accuracy and F1-score over three random stratified splits per dataset, and I also measure training time to assess the computational cost. Results show that the image encoding performed using large-receptive-field transformers with advanced self-attention—particularly deit3_base_patch16_224 and twins_svt_large—achieve the highest accuracies without incurring prohibitive training times. In contrast, encodings of the compact variants achieve faster training but incur notable performance drops around 7%–8%. These findings reveal a clear trade-off between representational power and efficiency. Practitioners can leverage such rankings to select a transformer backbone that best balances accuracy and computational efficiency for satellite image-based land cover classification tasks, accelerating the development of robust and resource-aware systems.
期刊介绍:
Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper.
As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.