Josep M. Serra-Diaz, Jeremy Borderieux, Brian Maitner, Coline C. F. Boonman, Daniel Park, Wen-Yong Guo, Arnaud Callebaut, Brian J. Enquist, Jens-C. Svenning, Cory Merow
{"title":"occTest:物种出现数据质量控制的综合方法","authors":"Josep M. Serra-Diaz, Jeremy Borderieux, Brian Maitner, Coline C. F. Boonman, Daniel Park, Wen-Yong Guo, Arnaud Callebaut, Brian J. Enquist, Jens-C. Svenning, Cory Merow","doi":"10.1111/geb.13847","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aim</h3>\n \n <p>Species occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses.</p>\n </section>\n \n <section>\n \n <h3> Innovation</h3>\n \n <p>We introduce an R package, occTest, that synthesizes a growing open-source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in test<i>Phases</i> (i.e. cleaning vs. testing) <i>that encompass different testBlocks</i> grouping different <i>testTypes</i> (e.g. <i>environmental outlier detection</i>), which may use different <i>testMethods</i> (e.g. <i>Rosner test, jacknife,</i>etc.). Four different <i>testBlocks</i> characterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user-defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed.</p>\n </section>\n \n <section>\n \n <h3> Main conclusions</h3>\n \n <p>occTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom-built rules. As a result, occTest can better assess each record's appropriateness for its intended application.</p>\n </section>\n </div>","PeriodicalId":176,"journal":{"name":"Global Ecology and Biogeography","volume":"33 7","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"occTest: An integrated approach for quality control of species occurrence data\",\"authors\":\"Josep M. Serra-Diaz, Jeremy Borderieux, Brian Maitner, Coline C. F. Boonman, Daniel Park, Wen-Yong Guo, Arnaud Callebaut, Brian J. Enquist, Jens-C. Svenning, Cory Merow\",\"doi\":\"10.1111/geb.13847\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Aim</h3>\\n \\n <p>Species occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Innovation</h3>\\n \\n <p>We introduce an R package, occTest, that synthesizes a growing open-source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in test<i>Phases</i> (i.e. cleaning vs. testing) <i>that encompass different testBlocks</i> grouping different <i>testTypes</i> (e.g. <i>environmental outlier detection</i>), which may use different <i>testMethods</i> (e.g. <i>Rosner test, jacknife,</i>etc.). Four different <i>testBlocks</i> characterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user-defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Main conclusions</h3>\\n \\n <p>occTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom-built rules. As a result, occTest can better assess each record's appropriateness for its intended application.</p>\\n </section>\\n </div>\",\"PeriodicalId\":176,\"journal\":{\"name\":\"Global Ecology and Biogeography\",\"volume\":\"33 7\",\"pages\":\"\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2024-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Ecology and Biogeography\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/geb.13847\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Ecology and Biogeography","FirstCategoryId":"93","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/geb.13847","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
occTest: An integrated approach for quality control of species occurrence data
Aim
Species occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses.
Innovation
We introduce an R package, occTest, that synthesizes a growing open-source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in testPhases (i.e. cleaning vs. testing) that encompass different testBlocks grouping different testTypes (e.g. environmental outlier detection), which may use different testMethods (e.g. Rosner test, jacknife,etc.). Four different testBlocks characterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user-defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed.
Main conclusions
occTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom-built rules. As a result, occTest can better assess each record's appropriateness for its intended application.
期刊介绍:
Global Ecology and Biogeography (GEB) welcomes papers that investigate broad-scale (in space, time and/or taxonomy), general patterns in the organization of ecological systems and assemblages, and the processes that underlie them. In particular, GEB welcomes studies that use macroecological methods, comparative analyses, meta-analyses, reviews, spatial analyses and modelling to arrive at general, conceptual conclusions. Studies in GEB need not be global in spatial extent, but the conclusions and implications of the study must be relevant to ecologists and biogeographers globally, rather than being limited to local areas, or specific taxa. Similarly, GEB is not limited to spatial studies; we are equally interested in the general patterns of nature through time, among taxa (e.g., body sizes, dispersal abilities), through the course of evolution, etc. Further, GEB welcomes papers that investigate general impacts of human activities on ecological systems in accordance with the above criteria.