{"title":"利用 Landsat 8 和 ALOS PALSAR 数据预测非洲大草原地上生物量的机器学习特征重要性选择。","authors":"Sa'ad Ibrahim , Heiko Balzter , Kevin Tansey","doi":"10.1016/j.mlwa.2024.100561","DOIUrl":null,"url":null,"abstract":"<div><p>In remote sensing, multiple input bands are derived from various sensors covering different regions of the electromagnetic spectrum. Each spectral band plays a unique role in land use/land cover characterization. For example, while integrating multiple sensors for predicting aboveground biomass (AGB) is important for achieving high accuracy, reducing the dataset size by eliminating redundant and irrelevant spectral features is essential for enhancing the performance of machine learning algorithms. This accelerates the learning process, thereby developing simpler and more efficient models. Our results indicate that compared individual sensor datasets, the random forest (RF) classification approach using recursive feature elimination (RFE) increased the accuracy based on F score by 82.86 % and 26.19 respectively. The mutual information regression (MIR) method shows a slight increase in accuracy when considering individual sensor datasets, but its accuracy decreases when all features are taken into account for all models. Overall, the combination of features from the Landsat 8, ALOS PALSAR backscatter, and elevation data selected based on RFE provided the best AGB estimation for the RF and XGBoost models. In contrast to the k-nearest neighbors (KNN) and support vector machines (SVM), no significant improvement in AGB estimation was detected even when RFE and MIR were used. The effect of parameter optimization was found to be more significant for RF than for all the other methods. The AGB maps show patterns of AGB estimates consistent with those of the reference dataset. This study shows how prediction errors can be minimized based on feature selection using different ML classifiers.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100561"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000379/pdfft?md5=eaa2c37c10a3e2753bcd07c6a3fa9373&pid=1-s2.0-S2666827024000379-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Machine learning feature importance selection for predicting aboveground biomass in African savannah with landsat 8 and ALOS PALSAR data\",\"authors\":\"Sa'ad Ibrahim , Heiko Balzter , Kevin Tansey\",\"doi\":\"10.1016/j.mlwa.2024.100561\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In remote sensing, multiple input bands are derived from various sensors covering different regions of the electromagnetic spectrum. Each spectral band plays a unique role in land use/land cover characterization. For example, while integrating multiple sensors for predicting aboveground biomass (AGB) is important for achieving high accuracy, reducing the dataset size by eliminating redundant and irrelevant spectral features is essential for enhancing the performance of machine learning algorithms. This accelerates the learning process, thereby developing simpler and more efficient models. Our results indicate that compared individual sensor datasets, the random forest (RF) classification approach using recursive feature elimination (RFE) increased the accuracy based on F score by 82.86 % and 26.19 respectively. The mutual information regression (MIR) method shows a slight increase in accuracy when considering individual sensor datasets, but its accuracy decreases when all features are taken into account for all models. Overall, the combination of features from the Landsat 8, ALOS PALSAR backscatter, and elevation data selected based on RFE provided the best AGB estimation for the RF and XGBoost models. In contrast to the k-nearest neighbors (KNN) and support vector machines (SVM), no significant improvement in AGB estimation was detected even when RFE and MIR were used. The effect of parameter optimization was found to be more significant for RF than for all the other methods. The AGB maps show patterns of AGB estimates consistent with those of the reference dataset. This study shows how prediction errors can be minimized based on feature selection using different ML classifiers.</p></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"16 \",\"pages\":\"Article 100561\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666827024000379/pdfft?md5=eaa2c37c10a3e2753bcd07c6a3fa9373&pid=1-s2.0-S2666827024000379-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827024000379\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827024000379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine learning feature importance selection for predicting aboveground biomass in African savannah with landsat 8 and ALOS PALSAR data
In remote sensing, multiple input bands are derived from various sensors covering different regions of the electromagnetic spectrum. Each spectral band plays a unique role in land use/land cover characterization. For example, while integrating multiple sensors for predicting aboveground biomass (AGB) is important for achieving high accuracy, reducing the dataset size by eliminating redundant and irrelevant spectral features is essential for enhancing the performance of machine learning algorithms. This accelerates the learning process, thereby developing simpler and more efficient models. Our results indicate that compared individual sensor datasets, the random forest (RF) classification approach using recursive feature elimination (RFE) increased the accuracy based on F score by 82.86 % and 26.19 respectively. The mutual information regression (MIR) method shows a slight increase in accuracy when considering individual sensor datasets, but its accuracy decreases when all features are taken into account for all models. Overall, the combination of features from the Landsat 8, ALOS PALSAR backscatter, and elevation data selected based on RFE provided the best AGB estimation for the RF and XGBoost models. In contrast to the k-nearest neighbors (KNN) and support vector machines (SVM), no significant improvement in AGB estimation was detected even when RFE and MIR were used. The effect of parameter optimization was found to be more significant for RF than for all the other methods. The AGB maps show patterns of AGB estimates consistent with those of the reference dataset. This study shows how prediction errors can be minimized based on feature selection using different ML classifiers.