Use and misuse of trait imputation in ecology: the problem of using out‐of‐context imputed values

IF 5.4 1区 环境科学与生态学 Q1 BIODIVERSITY CONSERVATION
Ecography Pub Date : 2025-02-04 DOI:10.1111/ecog.07520
Lucas Damián Gorné, Jesús Aguirre-Gutiérrez, Fernanda C. Souza, Nathan G. Swenson, Nathan Jared Boardman Kraft, Beatriz Schwantes Marimon, Timothy R. Baker, Renato A. Ferreira de Lima, Emilio Vilanova, Esteban Álvarez-Dávila, Abel Monteagudo Mendoza, Gerardo Rafael Flores Llampazo, Rubens Manoel dos Santos, Gerhard Boenisch, Alejandro Araujo-Murakami, Gonzalo Rivas-Torres, Hirma Ramírez-Angulo, Nayane Cristina dos Santos Prestes, Paulo S. Morandi, Sabina Cerruto Ribeiro, Wesley Jonatar A. da Cruz, Mathias Disney, Anthony Di Fiore, Ben Hur Marimon-Junior, Ted R. Feldpausch, Yadvinder Malhi, Oliver L. Phillips, David Galbraith, Sandra Díaz
{"title":"Use and misuse of trait imputation in ecology: the problem of using out‐of‐context imputed values","authors":"Lucas Damián Gorné, Jesús Aguirre-Gutiérrez, Fernanda C. Souza, Nathan G. Swenson, Nathan Jared Boardman Kraft, Beatriz Schwantes Marimon, Timothy R. Baker, Renato A. Ferreira de Lima, Emilio Vilanova, Esteban Álvarez-Dávila, Abel Monteagudo Mendoza, Gerardo Rafael Flores Llampazo, Rubens Manoel dos Santos, Gerhard Boenisch, Alejandro Araujo-Murakami, Gonzalo Rivas-Torres, Hirma Ramírez-Angulo, Nayane Cristina dos Santos Prestes, Paulo S. Morandi, Sabina Cerruto Ribeiro, Wesley Jonatar A. da Cruz, Mathias Disney, Anthony Di Fiore, Ben Hur Marimon-Junior, Ted R. Feldpausch, Yadvinder Malhi, Oliver L. Phillips, David Galbraith, Sandra Díaz","doi":"10.1111/ecog.07520","DOIUrl":null,"url":null,"abstract":"Despite the progress in the measurement and accessibility of plant trait information, acquiring sufficiently complete data from enough species to answer broad‐scale questions in plant functional ecology and biogeography remains challenging. A common way to overcome this challenge is by imputation, or ‘gap‐filling' of trait values. This has proven appropriate when focusing on the overall patterns emerging from the database being imputed. However, some applications force the imputation procedure out of its original scope, using imputed values independently from the imputation context, and specific trait values for a given species are used as input for computing new variables. We tested the performance of three widely used imputation methods (Bayesian hierarchical probabilistic matrix factorization, multiple imputation by chained equations with predictive mean matching, and Rphylopars) on a database of tropical tree and shrub traits. By applying a leave‐one‐out procedure, we assessed the accuracy and precision of the imputed values and found that out‐of‐context use of imputed values may bias the estimation of different variables. We also found that low redundancy (i.e. low predictability of a new value on the basis of existing values) in the dataset, not uncommon for empirical datasets, is likely the main cause of low accuracy and precision in the imputed values. We therefore suggest the use of a leave‐one‐out procedure to test the quality of the imputed values before any out‐of‐context application of the imputed values, and make practical recommendations to avoid the misuse of imputation procedures. Furthermore, we recommend not publishing gap‐filled datasets, publishing instead only the empirical data, together with the imputation method applied and the corresponding script to reproduce the imputation. This will help avoid the spread of imputed data, whose accuracy, precision, and source are difficult to assess and track, into the public domain.","PeriodicalId":51026,"journal":{"name":"Ecography","volume":"4 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecography","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1111/ecog.07520","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIODIVERSITY CONSERVATION","Score":null,"Total":0}
引用次数: 0

Abstract

Despite the progress in the measurement and accessibility of plant trait information, acquiring sufficiently complete data from enough species to answer broad‐scale questions in plant functional ecology and biogeography remains challenging. A common way to overcome this challenge is by imputation, or ‘gap‐filling' of trait values. This has proven appropriate when focusing on the overall patterns emerging from the database being imputed. However, some applications force the imputation procedure out of its original scope, using imputed values independently from the imputation context, and specific trait values for a given species are used as input for computing new variables. We tested the performance of three widely used imputation methods (Bayesian hierarchical probabilistic matrix factorization, multiple imputation by chained equations with predictive mean matching, and Rphylopars) on a database of tropical tree and shrub traits. By applying a leave‐one‐out procedure, we assessed the accuracy and precision of the imputed values and found that out‐of‐context use of imputed values may bias the estimation of different variables. We also found that low redundancy (i.e. low predictability of a new value on the basis of existing values) in the dataset, not uncommon for empirical datasets, is likely the main cause of low accuracy and precision in the imputed values. We therefore suggest the use of a leave‐one‐out procedure to test the quality of the imputed values before any out‐of‐context application of the imputed values, and make practical recommendations to avoid the misuse of imputation procedures. Furthermore, we recommend not publishing gap‐filled datasets, publishing instead only the empirical data, together with the imputation method applied and the corresponding script to reproduce the imputation. This will help avoid the spread of imputed data, whose accuracy, precision, and source are difficult to assess and track, into the public domain.
生态学中性状估算的使用与误用:使用断章取义的估算值问题
尽管在植物性状信息的测量和可及性方面取得了进展,但从足够多的物种中获取足够完整的数据来回答植物功能生态学和生物地理学的大范围问题仍然具有挑战性。克服这一挑战的一种常见方法是通过归因,或“填补空白”的特征值。事实证明,当关注从输入的数据库中出现的整体模式时,这是合适的。然而,一些应用程序迫使输入过程超出其原始范围,使用独立于输入上下文的输入值,并将给定物种的特定性状值用作计算新变量的输入。以热带乔灌木为研究对象,对贝叶斯层次概率矩阵分解法、预测均值匹配链式方程多重拟合法和Rphylopars三种常用的拟合方法进行了性能测试。通过使用“留一”程序,我们评估了输入值的准确性和精度,发现输入值在上下文之外的使用可能会使不同变量的估计产生偏差。我们还发现,数据集中的低冗余(即基于现有值的新值的低可预测性)对于经验数据集来说并不罕见,这可能是输入值准确性和精度低的主要原因。因此,我们建议在任何脱离上下文的应用输入值之前,使用留一程序来测试输入值的质量,并提出实用建议以避免误用输入程序。此外,我们建议不要发表填补空白的数据集,而是只发表经验数据,以及应用的代入方法和相应的脚本来重现代入。这将有助于避免将其准确性、精确度和来源难以评估和跟踪的虚假数据传播到公共领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ecography
Ecography 环境科学-生态学
CiteScore
11.60
自引率
3.40%
发文量
122
审稿时长
8-16 weeks
期刊介绍: ECOGRAPHY publishes exciting, novel, and important articles that significantly advance understanding of ecological or biodiversity patterns in space or time. Papers focusing on conservation or restoration are welcomed, provided they are anchored in ecological theory and convey a general message that goes beyond a single case study. We encourage papers that seek advancing the field through the development and testing of theory or methodology, or by proposing new tools for analysis or interpretation of ecological phenomena. Manuscripts are expected to address general principles in ecology, though they may do so using a specific model system if they adequately frame the problem relative to a generalized ecological question or problem. Purely descriptive papers are considered only if breaking new ground and/or describing patterns seldom explored. Studies focused on a single species or single location are generally discouraged unless they make a significant contribution to advancing general theory or understanding of biodiversity patterns and processes. Manuscripts merely confirming or marginally extending results of previous work are unlikely to be considered in Ecography. Papers are judged by virtue of their originality, appeal to general interest, and their contribution to new developments in studies of spatial and temporal ecological patterns. There are no biases with regard to taxon, biome, or biogeographical area.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信