Correction to “Predicting intraspecific trait variation among California's grasses”

IF 5.3 1区环境科学与生态学 Q1 ECOLOGY

Journal of Ecology Pub Date : 2024-12-10 DOI:10.1111/1365-2745.14466

{"title":"Correction to “Predicting intraspecific trait variation among California's grasses”","authors":"","doi":"10.1111/1365-2745.14466","DOIUrl":null,"url":null,"abstract":"<p>Sandel, B., Pavelka, C., Hayashi, T., et al. (2021) Predicting intraspecific trait variation among California's grasses. <i>Journal of Ecology</i>, <i>109</i>, 2662–2677. https://doi.org/10.1111/1365-2745.13673.</p>\n<p>In the paper by Sandel et al. (2021), an error has been identified in the code.</p>\n<p>The error was in generating the testing data subset for assessing random forest fit, causing it to not be independent of the training dataset. This affects Figures 3-5, and the corrected versions of these are included below. Table S3 has been updated in the article. The updated text referring to these figures in the section ‘3.2 Modelling ITV’ is also included below. The changes do not fundamentally alter the message of the paper.</p>\n<figure><picture>\n<source media=\"(min-width: 1650px)\" srcset=\"/cms/asset/a95ce570-b963-458a-ba8f-8032e33fc08f/jec14466-fig-0001-m.jpg\"/><img alt=\"Details are in the caption following the image\" data-lg-src=\"/cms/asset/a95ce570-b963-458a-ba8f-8032e33fc08f/jec14466-fig-0001-m.jpg\" loading=\"lazy\" src=\"/cms/asset/58c707eb-7764-4f13-910e-032b1da04aae/jec14466-fig-0001-m.png\" title=\"Details are in the caption following the image\"/></picture><figcaption>\n<div><strong>FIGURE 3<span style=\"font-weight:normal\"></span></strong><div>Open in figure viewer<i aria-hidden=\"true\"></i><span>PowerPoint</span></div>\n</div>\n<div>Improvements in model performance when adding variable groupings. Model performance was measured as the correlation between observed and predicted delta-trait values in the testing dataset. For each variable group, we take the mean performance of all models that included that variable group minus the mean performance for all models that excluded that variable. Climate variables were mean annual temperature and annual precipitation; local traits were local measures of specific leaf area (SLA), height or leaf area (LA) at a site, excluding the predicted measures (e.g. models predicting SLA were trained on Height and LA); species traits were the overall species means of SLA, height and LA; phylogeny was the first five phylogenetic Eigenvector maps; and species name is a categorical variable giving the species name.</div>\n</figcaption>\n</figure>\n<p><b>3.2 Modelling</b> <b>ITV</b></p>\n<p>Across all specifications of the random forest models, performance scores were very similar on the training and testing data subsets (on average, differing by <0.09, Table S3), suggesting little overfitting. When applied to the testing dataset, random forests containing all five predictor groups predicted values that were well correlated with the observed trait values (for delta-SLA: 0.74, SLA: 0.82, delta-Height: 0.67, Height = 0.88, delta-LA: 0.72, LA: 0.89, Table S3). Across all subsets of variable groups, other local traits (values of the non-focal trait from the local population, e.g. when predicting SLA, the Height of the plants) and climate were the most important groups for model performance (Figure 3). Species mean traits, name and phylogeny had smaller contributions to model fit. The performance of one such random forest, excluding the species predictor variable, is shown in Figure 4. The correlation between observed and predicted values is strong for both training and testing datasets. However, the observed–predicted relationships deviated somewhat from the 1:1 line, particularly for the delta-trait predictions. Standard major axis (SMA) regression slopes were less than 1, ranging from 0.65 to 0.73 for delta-trait models and 0.82 and 0.92 for the final local trait predictions. These deviations indicate that these models tend to predict less extreme values for the most extreme trait observations.</p>\n<figure><picture>\n<source media=\"(min-width: 1650px)\" srcset=\"/cms/asset/8ace7565-cf11-44c5-9200-82aacdd197ae/jec14466-fig-0002-m.jpg\"/><img alt=\"Details are in the caption following the image\" data-lg-src=\"/cms/asset/8ace7565-cf11-44c5-9200-82aacdd197ae/jec14466-fig-0002-m.jpg\" loading=\"lazy\" src=\"/cms/asset/cfba3354-6dd3-4a32-a1f2-7cedbb9aaef7/jec14466-fig-0002-m.png\" title=\"Details are in the caption following the image\"/></picture><figcaption>\n<div><strong>FIGURE 4<span style=\"font-weight:normal\"></span></strong><div>Open in figure viewer<i aria-hidden=\"true\"></i><span>PowerPoint</span></div>\n</div>\n<div>Model fit for random forests predicting local trait values from climate, other local traits, species mean traits and phylogenetic position. Each point represents a sample of a grass species from a particular location. Error bars indicate standard errors for the predictions. Models predicting delta-trait values are attempting to predict deviation of an individual from its species mean (left column). Adding the species means to these predictions gives an overall estimate of the trait value for an individual (right column).</div>\n</figcaption>\n</figure>\n<p>A model including other local trait measurements and species names would be of limited use for predicting trait values of a plant in an unmeasured location. In contrast, the climate of that location is readily available, and phylogenetic relationships are known for most species. Thus, we focused on a reduced model including just these two variable groups and two species-level traits: the species mean value for the focal trait and its life span. Removing species names from the model had little impact (Table S3), but removing other local traits reduced model performance (Figure 5). For example, predicted-observed correlations for SLA, height and LA dropped to 0.79, 0.88 and 0.89. This likely reflects the fact that other local trait measurements can provide insight into local conditions that are not captured by our two broad climate predictors. Despite this modest reduction, model performance for this simplified model was still fairly high.</p>\n<figure><picture>\n<source media=\"(min-width: 1650px)\" srcset=\"/cms/asset/0aaccbcf-6519-4c8f-b8c8-d12a4561c51a/jec14466-fig-0003-m.jpg\"/><img alt=\"Details are in the caption following the image\" data-lg-src=\"/cms/asset/0aaccbcf-6519-4c8f-b8c8-d12a4561c51a/jec14466-fig-0003-m.jpg\" loading=\"lazy\" src=\"/cms/asset/97eb7986-b0cc-4b66-a4d2-2971f66b120e/jec14466-fig-0003-m.png\" title=\"Details are in the caption following the image\"/></picture><figcaption>\n<div><strong>FIGURE 5<span style=\"font-weight:normal\"></span></strong><div>Open in figure viewer<i aria-hidden=\"true\"></i><span>PowerPoint</span></div>\n</div>\n<div>Model fit for random forests using only mean traits and phylogeny and trained on the entire dataset. Each point represents a sample of a grass species from a particular location. Error bars indicate standard errors. Models predicting delta-trait values are attempting to predict deviation of an individual from its species mean (left column). Adding the species means to these predictions gives an overall estimate of the trait value for an individual (right column).</div>\n</figcaption>\n</figure>\n<p>We apologise for this error.</p>","PeriodicalId":191,"journal":{"name":"Journal of Ecology","volume":"111 3S 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Ecology","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1111/1365-2745.14466","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Sandel, B., Pavelka, C., Hayashi, T., et al. (2021) Predicting intraspecific trait variation among California's grasses. Journal of Ecology, 109, 2662–2677. https://doi.org/10.1111/1365-2745.13673.

In the paper by Sandel et al. (2021), an error has been identified in the code.

The error was in generating the testing data subset for assessing random forest fit, causing it to not be independent of the training dataset. This affects Figures 3-5, and the corrected versions of these are included below. Table S3 has been updated in the article. The updated text referring to these figures in the section ‘3.2 Modelling ITV’ is also included below. The changes do not fundamentally alter the message of the paper.

Abstract Image — **FIGURE 3**
Open in figure viewerPowerPoint

Improvements in model performance when adding variable groupings. Model performance was measured as the correlation between observed and predicted delta-trait values in the testing dataset. For each variable group, we take the mean performance of all models that included that variable group minus the mean performance for all models that excluded that variable. Climate variables were mean annual temperature and annual precipitation; local traits were local measures of specific leaf area (SLA), height or leaf area (LA) at a site, excluding the predicted measures (e.g. models predicting SLA were trained on Height and LA); species traits were the overall species means of SLA, height and LA; phylogeny was the first five phylogenetic Eigenvector maps; and species name is a categorical variable giving the species name.

3.2 Modelling ITV

Across all specifications of the random forest models, performance scores were very similar on the training and testing data subsets (on average, differing by <0.09, Table S3), suggesting little overfitting. When applied to the testing dataset, random forests containing all five predictor groups predicted values that were well correlated with the observed trait values (for delta-SLA: 0.74, SLA: 0.82, delta-Height: 0.67, Height = 0.88, delta-LA: 0.72, LA: 0.89, Table S3). Across all subsets of variable groups, other local traits (values of the non-focal trait from the local population, e.g. when predicting SLA, the Height of the plants) and climate were the most important groups for model performance (Figure 3). Species mean traits, name and phylogeny had smaller contributions to model fit. The performance of one such random forest, excluding the species predictor variable, is shown in Figure 4. The correlation between observed and predicted values is strong for both training and testing datasets. However, the observed–predicted relationships deviated somewhat from the 1:1 line, particularly for the delta-trait predictions. Standard major axis (SMA) regression slopes were less than 1, ranging from 0.65 to 0.73 for delta-trait models and 0.82 and 0.92 for the final local trait predictions. These deviations indicate that these models tend to predict less extreme values for the most extreme trait observations.

A model including other local trait measurements and species names would be of limited use for predicting trait values of a plant in an unmeasured location. In contrast, the climate of that location is readily available, and phylogenetic relationships are known for most species. Thus, we focused on a reduced model including just these two variable groups and two species-level traits: the species mean value for the focal trait and its life span. Removing species names from the model had little impact (Table S3), but removing other local traits reduced model performance (Figure 5). For example, predicted-observed correlations for SLA, height and LA dropped to 0.79, 0.88 and 0.89. This likely reflects the fact that other local trait measurements can provide insight into local conditions that are not captured by our two broad climate predictors. Despite this modest reduction, model performance for this simplified model was still fairly high.

We apologise for this error.

查看原文本刊更多论文

对“预测加州禾草种内性状变异”的更正

Sandel， B., Pavelka， C., Hayashi， T.等（2021）预测加州禾草的种内性状变异。生态学报，2009,26(2):662 - 677。https://doi.org/10.1111/1365-2745.13673.In在Sandel et al.（2021）的论文中，代码中发现了一个错误。错误是在生成用于评估随机森林拟合的测试数据子集时，导致它不独立于训练数据集。这影响了图3-5，更正后的版本包含在下面。表S3已在本文中更新。“3.2 ITV建模”一节中引用这些数据的更新文本也包括在下面。这些变化并没有从根本上改变报纸的信息。图3在图形查看器中打开powerpoint添加变量分组时模型性能的改进。模型性能通过测试数据集中观察到的delta-trait值和预测的delta-trait值之间的相关性来衡量。对于每个变量组，我们用包含该变量组的所有模型的平均性能减去排除该变量的所有模型的平均性能。气候变量为年平均气温和年降水量；局部性状是指一个站点的比叶面积（SLA）、比高或比叶面积（LA）的局部度量，不包括预测度量（例如，预测比叶面积的模型是根据高度和比叶面积训练的）；种间性状为植被密度、高度和植被密度的总体种均值；系统发育为前5个系统发育特征向量图；3.2建模itv在所有规格的随机森林模型中，训练和测试数据子集的性能得分非常相似（平均差异为0.09，表S3），这表明很少有过拟合。当应用于测试数据集时，包含所有五个预测组的随机森林预测值与观察到的性状值具有良好的相关性（delta-SLA: 0.74, SLA: 0.82, delta-Height: 0.67, Height = 0.88, delta-LA: 0.72, LA: 0.89，表S3）。在所有变量组的子集中，其他局部性状（来自当地种群的非焦点性状的值，例如在预测SLA时，植物的高度）和气候是模型性能的最重要组（图3）。物种平均性状、名称和系统发育对模型拟合的贡献较小。其中一个这样的随机森林（不包括物种预测变量）的性能如图4所示。对于训练和测试数据集，观测值和预测值之间的相关性很强。然而，观察到的预测关系在一定程度上偏离了1:1的直线，特别是对于delta-trait的预测。标准长轴（SMA）回归斜率小于1，三角洲性状模型的回归斜率为0.65 ~ 0.73，最终局部性状预测的回归斜率为0.82 ~ 0.92。这些偏差表明，对于最极端的性状观察，这些模型倾向于预测不太极端的值。图4在图形查看器中打开powerpointmodel适合随机森林从气候、其他局部性状、物种平均性状和系统发育位置预测局部性状值。每个点代表一个特定地点的草种样本。误差条表示预测的标准误差。预测三角特征值的模型试图预测个体与其物种平均值的偏差（左列）。在这些预测中加上物种平均值，就给出了个体特征值的总体估计（右列）。一个包括其他本地性状测量和物种名称的模型对于预测未测量地点的植物性状值的作用有限。相比之下，那个地方的气候是现成的，而且大多数物种的系统发育关系是已知的。因此，我们将重点放在一个简化的模型上，该模型只包括这两个变量组和两个物种水平的特征：焦点特征的物种平均值及其寿命。从模型中删除物种名称影响不大（表S3），但删除其他局部特征会降低模型性能（图5）。例如，SLA、高度和LA的预测-观测相关性下降到0.79、0.88和0.89。这可能反映了这样一个事实，即其他局部特征测量可以提供对我们的两个广泛的气候预测器无法捕获的局部条件的洞察。尽管有这种适度的减少，这个简化模型的模型性能仍然相当高。图5打开图查看器powerpointmodel只使用平均特征和系统发育适合随机森林，并在整个数据集上训练。每个点代表一个特定地点的草种样本。误差条表示标准误差。预测三角特征值的模型试图预测个体与其物种平均值的偏差（左列）。在这些预测中加上物种平均值，就给出了个体特征值的总体估计（右列）。我们为这个错误道歉。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Ecology 环境科学-生态学

CiteScore

10.90

自引率

5.50%

发文量

207

审稿时长

3.0 months

期刊介绍： Journal of Ecology publishes original research papers on all aspects of the ecology of plants (including algae), in both aquatic and terrestrial ecosystems. We do not publish papers concerned solely with cultivated plants and agricultural ecosystems. Studies of plant communities, populations or individual species are accepted, as well as studies of the interactions between plants and animals, fungi or bacteria, providing they focus on the ecology of the plants. We aim to bring important work using any ecological approach (including molecular techniques) to a wide international audience and therefore only publish papers with strong and ecological messages that advance our understanding of ecological principles.