occTest:物种出现数据质量控制的综合方法

IF 6.3 1区 环境科学与生态学 Q1 ECOLOGY
Josep M. Serra-Diaz, Jeremy Borderieux, Brian Maitner, Coline C. F. Boonman, Daniel Park, Wen-Yong Guo, Arnaud Callebaut, Brian J. Enquist, Jens-C. Svenning, Cory Merow
{"title":"occTest:物种出现数据质量控制的综合方法","authors":"Josep M. Serra-Diaz,&nbsp;Jeremy Borderieux,&nbsp;Brian Maitner,&nbsp;Coline C. F. Boonman,&nbsp;Daniel Park,&nbsp;Wen-Yong Guo,&nbsp;Arnaud Callebaut,&nbsp;Brian J. Enquist,&nbsp;Jens-C. Svenning,&nbsp;Cory Merow","doi":"10.1111/geb.13847","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aim</h3>\n \n <p>Species occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses.</p>\n </section>\n \n <section>\n \n <h3> Innovation</h3>\n \n <p>We introduce an R package, occTest, that synthesizes a growing open-source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in test<i>Phases</i> (i.e. cleaning vs. testing) <i>that encompass different testBlocks</i> grouping different <i>testTypes</i> (e.g. <i>environmental outlier detection</i>), which may use different <i>testMethods</i> (e.g. <i>Rosner test, jacknife,</i>etc.). Four different <i>testBlocks</i> characterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user-defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed.</p>\n </section>\n \n <section>\n \n <h3> Main conclusions</h3>\n \n <p>occTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom-built rules. As a result, occTest can better assess each record's appropriateness for its intended application.</p>\n </section>\n </div>","PeriodicalId":176,"journal":{"name":"Global Ecology and Biogeography","volume":"33 7","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"occTest: An integrated approach for quality control of species occurrence data\",\"authors\":\"Josep M. Serra-Diaz,&nbsp;Jeremy Borderieux,&nbsp;Brian Maitner,&nbsp;Coline C. F. Boonman,&nbsp;Daniel Park,&nbsp;Wen-Yong Guo,&nbsp;Arnaud Callebaut,&nbsp;Brian J. Enquist,&nbsp;Jens-C. Svenning,&nbsp;Cory Merow\",\"doi\":\"10.1111/geb.13847\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Aim</h3>\\n \\n <p>Species occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Innovation</h3>\\n \\n <p>We introduce an R package, occTest, that synthesizes a growing open-source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in test<i>Phases</i> (i.e. cleaning vs. testing) <i>that encompass different testBlocks</i> grouping different <i>testTypes</i> (e.g. <i>environmental outlier detection</i>), which may use different <i>testMethods</i> (e.g. <i>Rosner test, jacknife,</i>etc.). Four different <i>testBlocks</i> characterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user-defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Main conclusions</h3>\\n \\n <p>occTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom-built rules. As a result, occTest can better assess each record's appropriateness for its intended application.</p>\\n </section>\\n </div>\",\"PeriodicalId\":176,\"journal\":{\"name\":\"Global Ecology and Biogeography\",\"volume\":\"33 7\",\"pages\":\"\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2024-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Ecology and Biogeography\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/geb.13847\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Ecology and Biogeography","FirstCategoryId":"93","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/geb.13847","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

物种出现数据是宝贵的信息,可帮助人们估计物种的地理分布,描述物种的生态位及其演变,并指导空间保护规划。物种出现数据的快速增长源于数字化和汇总工作的不断加强,以及公民科学活动的开展。然而,物种出现数据中持续存在的质量问题会影响科学研究结果的准确性,这就凸显了在生物多样性分析中过滤错误出现记录的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
occTest: An integrated approach for quality control of species occurrence data

Aim

Species occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses.

Innovation

We introduce an R package, occTest, that synthesizes a growing open-source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in testPhases (i.e. cleaning vs. testing) that encompass different testBlocks grouping different testTypes (e.g. environmental outlier detection), which may use different testMethods (e.g. Rosner test, jacknife,etc.). Four different testBlocks characterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user-defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed.

Main conclusions

occTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom-built rules. As a result, occTest can better assess each record's appropriateness for its intended application.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Global Ecology and Biogeography
Global Ecology and Biogeography 环境科学-生态学
CiteScore
12.10
自引率
3.10%
发文量
170
审稿时长
3 months
期刊介绍: Global Ecology and Biogeography (GEB) welcomes papers that investigate broad-scale (in space, time and/or taxonomy), general patterns in the organization of ecological systems and assemblages, and the processes that underlie them. In particular, GEB welcomes studies that use macroecological methods, comparative analyses, meta-analyses, reviews, spatial analyses and modelling to arrive at general, conceptual conclusions. Studies in GEB need not be global in spatial extent, but the conclusions and implications of the study must be relevant to ecologists and biogeographers globally, rather than being limited to local areas, or specific taxa. Similarly, GEB is not limited to spatial studies; we are equally interested in the general patterns of nature through time, among taxa (e.g., body sizes, dispersal abilities), through the course of evolution, etc. Further, GEB welcomes papers that investigate general impacts of human activities on ecological systems in accordance with the above criteria.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信