Evaluating Random Forest Model Performance for Cave and Sinkhole Prediction in the Cradle of Humankind, South Africa: Preliminary Analysis and Variable Importance Assessments

IF 2.8 1区 历史学 Q1 ANTHROPOLOGY
Margaret J. Furtner, Robert L. Anemone, Lei Wang, Juliet K. Brophy
{"title":"Evaluating Random Forest Model Performance for Cave and Sinkhole Prediction in the Cradle of Humankind, South Africa: Preliminary Analysis and Variable Importance Assessments","authors":"Margaret J. Furtner, Robert L. Anemone, Lei Wang, Juliet K. Brophy","doi":"10.1007/s10816-025-09761-1","DOIUrl":null,"url":null,"abstract":"Surveying an area for new fossil sites is a labor-intensive and resource-draining activity that can be alleviated with the aid of machine learning models. In karst landscapes of southern Africa, Plio-Pleistocene fossils that inform the paleoanthropological record are primarily found preserved in caves and sinkholes. The purpose of this study is to assess the utility of Random Forest (RF) models for cave and sinkhole prediction in the Cradle of Humankind, South Africa. Multispectral satellite imagery, digital elevation models (DEMs), and geologic maps were converted into raster (pixelated matrix) images in a GIS environment to denote varying aspects of the local topography, including elevation, slope, aspect, curvature, drainage, spectral reflectance, vegetation cover, fault proximity, and underlying geology. The rasters were stacked and overlaid with 1080 known cave and sinkhole locality points and 1080 random non-cave points in the study area for model training. Variable values associated with these geopoints were input into an RF model in Python for training and evaluation using a spatial ten-fold cross-validation. The model performed with 81.6% accuracy and an area under the curve (AUC) of 0.912. The importance of each variable for prediction was evaluated by measuring the increase in prediction error when variable values were shuffled. Distance to major faults, location within the Chuniespoort geologic group, dolomite presence, chert presence, and elevation exhibited the highest importance for model accuracy, while three out of 48 total predictor variables exhibited less importance than a randomly generated variable. The identification of important/unimportant variables will help build more efficient, robust models in future iterations, as well as help identify variables that could be useful in other karst regions.","PeriodicalId":47725,"journal":{"name":"Journal of Archaeological Method and Theory","volume":"60 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2026-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Archaeological Method and Theory","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1007/s10816-025-09761-1","RegionNum":1,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANTHROPOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Surveying an area for new fossil sites is a labor-intensive and resource-draining activity that can be alleviated with the aid of machine learning models. In karst landscapes of southern Africa, Plio-Pleistocene fossils that inform the paleoanthropological record are primarily found preserved in caves and sinkholes. The purpose of this study is to assess the utility of Random Forest (RF) models for cave and sinkhole prediction in the Cradle of Humankind, South Africa. Multispectral satellite imagery, digital elevation models (DEMs), and geologic maps were converted into raster (pixelated matrix) images in a GIS environment to denote varying aspects of the local topography, including elevation, slope, aspect, curvature, drainage, spectral reflectance, vegetation cover, fault proximity, and underlying geology. The rasters were stacked and overlaid with 1080 known cave and sinkhole locality points and 1080 random non-cave points in the study area for model training. Variable values associated with these geopoints were input into an RF model in Python for training and evaluation using a spatial ten-fold cross-validation. The model performed with 81.6% accuracy and an area under the curve (AUC) of 0.912. The importance of each variable for prediction was evaluated by measuring the increase in prediction error when variable values were shuffled. Distance to major faults, location within the Chuniespoort geologic group, dolomite presence, chert presence, and elevation exhibited the highest importance for model accuracy, while three out of 48 total predictor variables exhibited less importance than a randomly generated variable. The identification of important/unimportant variables will help build more efficient, robust models in future iterations, as well as help identify variables that could be useful in other karst regions.
评价随机森林模型在南非人类摇篮洞穴和天坑预测中的性能:初步分析和变量重要性评估
在一个地区调查新的化石遗址是一项劳动密集型和资源消耗的活动,可以通过机器学习模型的帮助来缓解。在非洲南部的喀斯特地貌中,为古人类记录提供信息的上新世-更新世化石主要被发现保存在洞穴和天坑中。本研究的目的是评估随机森林(RF)模型在南非人类摇篮的洞穴和天坑预测中的效用。在GIS环境下,多光谱卫星图像、数字高程模型(dem)和地质图被转换成栅格(像素矩阵)图像,以表示当地地形的不同方面,包括高程、坡度、坡向、曲率、排水、光谱反射率、植被覆盖、断层接近度和底层地质情况。将栅格与研究区域内已知的1080个洞穴和天坑位置点以及随机的1080个非洞穴点进行叠加,用于模型训练。与这些地理点相关的变量值被输入到Python中的RF模型中,使用空间十倍交叉验证进行训练和评估。该模型的准确度为81.6%,曲线下面积(AUC)为0.912。通过测量变量值洗牌时预测误差的增加来评估每个变量对预测的重要性。到主要断层的距离、在Chuniespoort地质群内的位置、白云岩的存在、燧石的存在和海拔高度对模型精度的重要性最高,而48个预测变量中的3个变量的重要性低于随机生成的变量。识别重要/不重要的变量将有助于在未来的迭代中构建更有效、更健壮的模型,也有助于识别在其他喀斯特地区可能有用的变量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.30
自引率
8.70%
发文量
43
期刊介绍: The Journal of Archaeological Method and Theory, the leading journal in its field,  presents original articles that address method- or theory-focused issues of current archaeological interest and represent significant explorations on the cutting edge of the discipline.   The journal also welcomes topical syntheses that critically assess and integrate research on a specific subject in archaeological method or theory, as well as examinations of the history of archaeology.    Written by experts, the articles benefit an international audience of archaeologists, students of archaeology, and practitioners of closely related disciplines.  Specific topics covered in recent issues include:  the use of nitche construction theory in archaeology,  new developments in the use of soil chemistry in archaeological interpretation, and a model for the prehistoric development of clothing.  The Journal''s distinguished Editorial Board includes archaeologists with worldwide archaeological knowledge (the Americas, Asia and the Pacific, Europe, and Africa), and expertise in a wide range of methodological and theoretical issues.  Rated ''A'' in the European Reference Index for the Humanities (ERIH) Journal of Archaeological Method and Theory is rated ''A'' in the ERIH, a new reference index that aims to help evenly access the scientific quality of Humanities research output. For more information visit: http://www.esf.org/research-areas/humanities/activities/research-infrastructures.html Rated ''A'' in the Australian Research Council Humanities and Creative Arts Journal List.  For more information, visit: http://www.arc.gov.au/era/journal_list_dev.htm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书