Ali Shebl , Dávid Abriha , Maher Dawoud , Mosaad Ali Hussein Ali , Árpád Csámer
{"title":"PRISMA vs. Landsat 9 in lithological mapping − a K-fold Cross-Validation implementation with Random Forest","authors":"Ali Shebl , Dávid Abriha , Maher Dawoud , Mosaad Ali Hussein Ali , Árpád Csámer","doi":"10.1016/j.ejrs.2024.07.003","DOIUrl":null,"url":null,"abstract":"<div><p>The selection of an optimal dataset is crucial for successful remote sensing analysis. The PRISMA hyperspectral sensor (with 240 spectral bands) and Landsat OLI-2 (boasting high dynamic resolution) offer robust data for various remote sensing applications, anticipating their increased demand in the coming years. However, despite their potential, we have not identified a rigorous evaluation of both datasets in geological applications utilizing Machine Learning Algorithms. Consequently, we conduct a comprehensive analysis using Random Forest, a widely-recommended machine learning algorithm, and employ K-fold cross-validation (with <em>K</em> = 2, 5, 10) with grid-search hyperparameter tuning for enhanced performance. Toward this aim, diverse image-processing approaches, including Principal Component Analysis (PCA), Minimum Noise Fraction (MNF), and Independent Component Analysis (ICA), were applied to enhance feature selection and extraction. Subsequently, to ensure better performance of the RF algorithm, this study utilized well-distributed points instead of polygons to represent each target, thereby mitigating the effects of spatial autocorrelation. Our results reveal dataset-hyperparameter dependencies, with PRISMA mainly influenced by <em>max_depth</em> and Landsat 9 by <em>max_features</em>. Employing grid-search optimally balances dataset characteristics and data splitting (folds), generating accurate lithological maps across all K values. Notably, a significant hyperparameter shift at <em>K</em> = 10 produces the best lithological maps. Fieldwork and petrographic investigations validate the lithological maps, indicating PRISMA’s slight superiority over Landsat OLI-2. Despite this, given the dataset nature and band count difference, we still advocate Landsat 9 as a potent multispectral input for future applications due to its superior radiometric resolution.</p></div>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1110982324000553/pdfft?md5=cd78548dacf563f3d654cb587e5c2940&pid=1-s2.0-S1110982324000553-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110982324000553","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The selection of an optimal dataset is crucial for successful remote sensing analysis. The PRISMA hyperspectral sensor (with 240 spectral bands) and Landsat OLI-2 (boasting high dynamic resolution) offer robust data for various remote sensing applications, anticipating their increased demand in the coming years. However, despite their potential, we have not identified a rigorous evaluation of both datasets in geological applications utilizing Machine Learning Algorithms. Consequently, we conduct a comprehensive analysis using Random Forest, a widely-recommended machine learning algorithm, and employ K-fold cross-validation (with K = 2, 5, 10) with grid-search hyperparameter tuning for enhanced performance. Toward this aim, diverse image-processing approaches, including Principal Component Analysis (PCA), Minimum Noise Fraction (MNF), and Independent Component Analysis (ICA), were applied to enhance feature selection and extraction. Subsequently, to ensure better performance of the RF algorithm, this study utilized well-distributed points instead of polygons to represent each target, thereby mitigating the effects of spatial autocorrelation. Our results reveal dataset-hyperparameter dependencies, with PRISMA mainly influenced by max_depth and Landsat 9 by max_features. Employing grid-search optimally balances dataset characteristics and data splitting (folds), generating accurate lithological maps across all K values. Notably, a significant hyperparameter shift at K = 10 produces the best lithological maps. Fieldwork and petrographic investigations validate the lithological maps, indicating PRISMA’s slight superiority over Landsat OLI-2. Despite this, given the dataset nature and band count difference, we still advocate Landsat 9 as a potent multispectral input for future applications due to its superior radiometric resolution.