Bhargav Parulekar, Nischal Singh, Anandakumar M. Ramiya
{"title":"Evaluation of segment anything model (SAM) for automated labelling in machine learning classification of UAV geospatial data","authors":"Bhargav Parulekar, Nischal Singh, Anandakumar M. Ramiya","doi":"10.1007/s12145-024-01402-7","DOIUrl":null,"url":null,"abstract":"<p>With the present trend toward digitization in many areas of urban planning and development, accurate object classification is becoming increasingly vital. To develop machine learning models that can effectively classify the broader region, it is crucial to have accurately labelled datasets for object extraction. However, the process of generating sufficient labelled data for machine learning models remains challenging. A recently developed AI-assisted segmentation approach called the Segment Anything Model (SAM) offers a solution to enhance the labelling of complex and intricate image structures. By utilizing SAM, the accuracy and consistency of annotation results can be improved, while also significantly reducing the time required for annotation. This paper aims to assess the efficiency of SAM annotated labels for training machine learning models using high-resolution remote sensing data captured by UAVs (Unmanned Aerial Vehicles) in the peri-urban region of Anad, Kerala, India. A comparative analysis was conducted to evaluate the performance of training datasets generated using SAM and manual labelling with existing tools. Multiple machine learning models, including Random Forest, Support Vector Machine, and XGBoost, were employed for this analysis. The findings demonstrate that employing the XGBoost algorithm in combination with SAM annotated labels yielded an accuracy of 78%. In contrast, the same algorithm trained with the manually labeled dataset achieved an accuracy of only 68%. A similar pattern was observed when employing the Random Forest algorithm, with accuracies of 78% and 60% while using SAM annotated labels and manual labels, respectively. These outcomes unequivocally showcase the enhanced effectiveness and dependability of the SAM-based segmentation method in producing accurate results.</p>","PeriodicalId":49318,"journal":{"name":"Earth Science Informatics","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth Science Informatics","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s12145-024-01402-7","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
With the present trend toward digitization in many areas of urban planning and development, accurate object classification is becoming increasingly vital. To develop machine learning models that can effectively classify the broader region, it is crucial to have accurately labelled datasets for object extraction. However, the process of generating sufficient labelled data for machine learning models remains challenging. A recently developed AI-assisted segmentation approach called the Segment Anything Model (SAM) offers a solution to enhance the labelling of complex and intricate image structures. By utilizing SAM, the accuracy and consistency of annotation results can be improved, while also significantly reducing the time required for annotation. This paper aims to assess the efficiency of SAM annotated labels for training machine learning models using high-resolution remote sensing data captured by UAVs (Unmanned Aerial Vehicles) in the peri-urban region of Anad, Kerala, India. A comparative analysis was conducted to evaluate the performance of training datasets generated using SAM and manual labelling with existing tools. Multiple machine learning models, including Random Forest, Support Vector Machine, and XGBoost, were employed for this analysis. The findings demonstrate that employing the XGBoost algorithm in combination with SAM annotated labels yielded an accuracy of 78%. In contrast, the same algorithm trained with the manually labeled dataset achieved an accuracy of only 68%. A similar pattern was observed when employing the Random Forest algorithm, with accuracies of 78% and 60% while using SAM annotated labels and manual labels, respectively. These outcomes unequivocally showcase the enhanced effectiveness and dependability of the SAM-based segmentation method in producing accurate results.
随着当前许多城市规划和发展领域的数字化趋势,准确的物体分类变得越来越重要。要开发能对更广泛区域进行有效分类的机器学习模型,关键是要有准确标注的数据集来提取对象。然而,为机器学习模型生成足够的标注数据的过程仍然充满挑战。最近开发的人工智能辅助分割方法--"任意分割模型"(SAM)提供了一种解决方案,可以增强对复杂和错综复杂的图像结构的标注。通过使用 SAM,可以提高标注结果的准确性和一致性,同时还能大大减少标注所需的时间。本文旨在利用无人机(UAV)在印度喀拉拉邦阿纳德近郊地区捕获的高分辨率遥感数据,评估 SAM 注释标签在训练机器学习模型方面的效率。我们进行了一项比较分析,以评估使用 SAM 生成的训练数据集和使用现有工具手动标记的训练数据集的性能。分析中使用了多种机器学习模型,包括随机森林、支持向量机和 XGBoost。研究结果表明,将 XGBoost 算法与 SAM 标注相结合,准确率达到 78%。相比之下,使用人工标注数据集训练的同一算法的准确率仅为 68%。在使用随机森林算法时也观察到了类似的模式,在使用 SAM 注释标签和人工标签时,准确率分别为 78% 和 60%。这些结果清楚地表明,基于 SAM 的分割方法在产生准确结果方面具有更高的有效性和可靠性。
期刊介绍:
The Earth Science Informatics [ESIN] journal aims at rapid publication of high-quality, current, cutting-edge, and provocative scientific work in the area of Earth Science Informatics as it relates to Earth systems science and space science. This includes articles on the application of formal and computational methods, computational Earth science, spatial and temporal analyses, and all aspects of computer applications to the acquisition, storage, processing, interchange, and visualization of data and information about the materials, properties, processes, features, and phenomena that occur at all scales and locations in the Earth system’s five components (atmosphere, hydrosphere, geosphere, biosphere, cryosphere) and in space (see "About this journal" for more detail). The quarterly journal publishes research, methodology, and software articles, as well as editorials, comments, and book and software reviews. Review articles of relevant findings, topics, and methodologies are also considered.