On bootstrapping, data overfitting and crocodiles: an additional comment to McPherron et al. (2022)

IF 2.1 2区地球科学 Q1 ANTHROPOLOGY

Archaeological and Anthropological Sciences Pub Date : 2025-02-18 DOI:10.1007/s12520-025-02183-w

Manuel Domínguez-Rodrigo, Enrique Baquedano

{"title":"On bootstrapping, data overfitting and crocodiles: an additional comment to McPherron et al. (2022)","authors":"Manuel Domínguez-Rodrigo, Enrique Baquedano","doi":"10.1007/s12520-025-02183-w","DOIUrl":null,"url":null,"abstract":"<div><p>Quaternary hominin-carnivore interactions is taphonomically reconstructed best through the use of bone surface modifications (BSM). This study examines redundancy in an experimental dataset of potentially similar BSM created by crocodile tooth-marking, sedimentary trampling and stone tool cut marking (Domínguez-Rodrigo and Baquedano in Sci Rep 8:5786, 2018). The original analysis of this experimental set, aiming to confidently classify the three types of BSM, was criticized by some authors (McPherron et al. in J Hum Evol 164:103071, 2022) insinuating that the analysis was flawed by a potential methodological overfitting caused by the improper use of bootstrap. A subsequent response to that critique (Abellán et al. in Geobios Memoire Special. 72–73, 12–21, 2022) showed that there was no difference in the results between using the raw data and the bootstrapped data. It was argued that structural co-variance and redundancy of the categorical dataset was responsible for the highly accurate models; however, this was never empirically demonstrated. Here, we show how the original experimental dataset is saturated with redundancy. Our analysis revealed that, out of 633 cases, only 116 were unique (18.3%) in the complete dataset, 45 unique cases (7.1%) in the intrinsic variable dataset, and just four unique cases (0.63%) in the three-variable dataset (accounting for most of the sample variance). Redundancy, therefore, ranged from 81.7% to over 99%. Machine learning analysis using Random Forest (RF) and C5.0 algorithms on the datasets demonstrated high accuracy with the raw data (90-98%). Proper bootstrapping yielded nearly identical accuracy (88-98%), while improper bootstrapping slightly reduced accuracy (86-98%) and introduced some degree of underfitting. This underscores that the potential biasing effects of bootstrapping differ between numerical and categorical datasets, especially on those with low dimensionality and low cardinality, in situations of feature interdependence and covariance. A complementary approach, consisting of an iterative data partitioning method through train-test resampling reproduced the results derived from the bootstrapped samples. The understanding of these methodological processes is essential to an adequate application of these experimental models to the fossil record.</p></div>","PeriodicalId":8214,"journal":{"name":"Archaeological and Anthropological Sciences","volume":"17 3","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s12520-025-02183-w.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archaeological and Anthropological Sciences","FirstCategoryId":"89","ListUrlMain":"https://link.springer.com/article/10.1007/s12520-025-02183-w","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANTHROPOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Quaternary hominin-carnivore interactions is taphonomically reconstructed best through the use of bone surface modifications (BSM). This study examines redundancy in an experimental dataset of potentially similar BSM created by crocodile tooth-marking, sedimentary trampling and stone tool cut marking (Domínguez-Rodrigo and Baquedano in Sci Rep 8:5786, 2018). The original analysis of this experimental set, aiming to confidently classify the three types of BSM, was criticized by some authors (McPherron et al. in J Hum Evol 164:103071, 2022) insinuating that the analysis was flawed by a potential methodological overfitting caused by the improper use of bootstrap. A subsequent response to that critique (Abellán et al. in Geobios Memoire Special. 72–73, 12–21, 2022) showed that there was no difference in the results between using the raw data and the bootstrapped data. It was argued that structural co-variance and redundancy of the categorical dataset was responsible for the highly accurate models; however, this was never empirically demonstrated. Here, we show how the original experimental dataset is saturated with redundancy. Our analysis revealed that, out of 633 cases, only 116 were unique (18.3%) in the complete dataset, 45 unique cases (7.1%) in the intrinsic variable dataset, and just four unique cases (0.63%) in the three-variable dataset (accounting for most of the sample variance). Redundancy, therefore, ranged from 81.7% to over 99%. Machine learning analysis using Random Forest (RF) and C5.0 algorithms on the datasets demonstrated high accuracy with the raw data (90-98%). Proper bootstrapping yielded nearly identical accuracy (88-98%), while improper bootstrapping slightly reduced accuracy (86-98%) and introduced some degree of underfitting. This underscores that the potential biasing effects of bootstrapping differ between numerical and categorical datasets, especially on those with low dimensionality and low cardinality, in situations of feature interdependence and covariance. A complementary approach, consisting of an iterative data partitioning method through train-test resampling reproduced the results derived from the bootstrapped samples. The understanding of these methodological processes is essential to an adequate application of these experimental models to the fossil record.

查看原文本刊更多论文

关于自举、数据过拟合和鳄鱼：对mcphron等人（2022）的补充评论

第四纪古人类与食肉动物的相互作用通过骨表面修饰（BSM）在地药学上得到最好的重建。本研究检查了鳄鱼牙齿标记、沉积踩踏和石器切割标记创建的潜在相似BSM实验数据集的冗余度（Domínguez-Rodrigo和Baquedano in Sci Rep 8:5786, 2018）。该实验集的原始分析旨在自信地对三种类型的BSM进行分类，但受到一些作者的批评（McPherron等人在J Hum evolution 164:103071, 2022），暗示该分析存在缺陷，因为不当使用bootstrap可能导致方法过拟合。随后对该批评的回应（Abellán等人在Geobios Memoire Special. 72 - 73,12 - 21,2022）表明，使用原始数据和自引导数据之间的结果没有差异。认为分类数据的结构协方差和冗余是模型精度高的原因；然而，这一点从未得到实证证明。在这里，我们展示了原始实验数据集是如何被冗余饱和的。我们的分析显示，在633个案例中，只有116个案例在完整数据集中是独特的（18.3%），45个案例在内在变量数据集中是独特的（7.1%），而在三变量数据集中只有4个案例（0.63%）（占样本方差的大部分）。因此，冗余的范围从81.7%到99%以上。使用随机森林（RF）和C5.0算法对数据集进行机器学习分析，对原始数据具有较高的准确率（90-98%）。适当的引导产生了几乎相同的精度（88-98%），而不适当的引导略微降低了精度（86-98%）并引入了一定程度的欠拟合。这强调了在特征相互依赖和协方差的情况下，自举的潜在偏倚效应在数值和分类数据集之间是不同的，特别是在那些低维和低基数的数据集上。一种互补的方法，包括通过训练测试重采样的迭代数据划分方法，再现了自举样本的结果。对这些方法过程的理解对于将这些实验模型充分应用于化石记录是必不可少的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Archaeological and Anthropological Sciences GEOSCIENCES, MULTIDISCIPLINARY-

CiteScore

4.80

自引率

18.20%

发文量

199

期刊介绍： Archaeological and Anthropological Sciences covers the full spectrum of natural scientific methods with an emphasis on the archaeological contexts and the questions being studied. It bridges the gap between archaeologists and natural scientists providing a forum to encourage the continued integration of scientific methodologies in archaeological research. Coverage in the journal includes: archaeology, geology/geophysical prospection, geoarchaeology, geochronology, palaeoanthropology, archaeozoology and archaeobotany, genetics and other biomolecules, material analysis and conservation science. The journal is endorsed by the German Society of Natural Scientific Archaeology and Archaeometry (GNAA), the Hellenic Society for Archaeometry (HSC), the Association of Italian Archaeometrists (AIAr) and the Society of Archaeological Sciences (SAS).