Chao不等式的三十年后代:发生率数据和不完全抽样的丰富性估计和比较

IF 1.2 4区数学 Q4 OPERATIONS RESEARCH & MANAGEMENT SCIENCE

Sort-Statistics and Operations Research Transactions Pub Date : 2017-06-21 DOI:10.2436/20.8080.02.49

A. Chao, Robert K. Colwell

{"title":"Chao不等式的三十年后代:发生率数据和不完全抽样的丰富性估计和比较","authors":"A. Chao, Robert K. Colwell","doi":"10.2436/20.8080.02.49","DOIUrl":null,"url":null,"abstract":"In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals’ capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao’s inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally-large or equally-complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao’s inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.","PeriodicalId":49497,"journal":{"name":"Sort-Statistics and Operations Research Transactions","volume":"1 1","pages":"3-54"},"PeriodicalIF":1.2000,"publicationDate":"2017-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"54","resultStr":"{\"title\":\"Thirty years of progeny from Chao’s inequality: Estimating and comparing richness with incidence data and incomplete sampling\",\"authors\":\"A. Chao, Robert K. Colwell\",\"doi\":\"10.2436/20.8080.02.49\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals’ capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao’s inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally-large or equally-complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao’s inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.\",\"PeriodicalId\":49497,\"journal\":{\"name\":\"Sort-Statistics and Operations Research Transactions\",\"volume\":\"1 1\",\"pages\":\"3-54\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2017-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"54\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sort-Statistics and Operations Research Transactions\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.2436/20.8080.02.49\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OPERATIONS RESEARCH & MANAGEMENT SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sort-Statistics and Operations Research Transactions","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.2436/20.8080.02.49","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}

引用次数: 54

摘要

在捕获-再捕获研究的背景下，Chao(1987)推导了捕获频率计数之间的不等式，以获得基于个体多次捕获/非捕获记录的种群大小的下界。该不等式被应用于基于多个采样单元的物种发生率(检测/非检测)数据的组合物种丰富度的非参数下界。该不等式表明，未检测到的物种数量可以从唯一物种(仅在一个采样单元中检测到的物种)和重复物种(恰好在两个采样单元中检测到的物种)的物种发生率计数推断出来。在他们的开创性论文中，Colwell和Coddington(1994)将由此产生的物种丰富度估计值命名为“Chao2”。(“Chao1”估计器是指基于物种丰度数据的类似类型的估计器)。从那时起，Chao2估计量被应用于许多研究领域，并产生了富有成效的推广。本文首先回顾了各种模型下的Chao不等式，并讨论了相关的统计推断问题:(1)在什么条件下Chao2估计量是无偏点估计量?(2)需要多少额外的采样单位来检测任意比例(包括100%)的渐近物种丰富度的Chao2估计?(3)是否可以使用其他的发生率计数来获得类似的下界?然后，我们展示了如何使用Chao2估计器来指导非渐近分析，其中物种丰富度估计器可以通过基于样本量和基于覆盖率的稀疏和外推来比较相同大小或相同完整的样本。我们还回顾了Chao不等式在其他无替换采样方案(例如，一组样方，每个样方只调查一次)下估计物种丰富度的推广，以获得两个或多个组合之间共享的未检测物种的下界，并允许推断未检测到的系统发育丰富度(连接所有物种的系统发育树的未检测分支的总长度)，以及相关的稀疏和外推。使用在线软件SpadeR, iNEXT和PhD，使用澳大利亚鸟类的小型经验数据集进行说明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Thirty years of progeny from Chao’s inequality: Estimating and comparing richness with incidence data and incomplete sampling

In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals’ capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao’s inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally-large or equally-complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao’s inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sort-Statistics and Operations Research Transactions 管理科学-统计学与概率论

CiteScore

3.10

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： SORT (Statistics and Operations Research Transactions) —formerly Qüestiió— is an international journal launched in 2003. It is published twice-yearly, in English, by the Statistical Institute of Catalonia (Idescat). The journal is co-edited by the Universitat Politècnica de Catalunya, Universitat de Barcelona, Universitat Autonòma de Barcelona, Universitat de Girona, Universitat Pompeu Fabra i Universitat de Lleida, with the co-operation of the Spanish Section of the International Biometric Society and the Catalan Statistical Society. SORT promotes the publication of original articles of a methodological or applied nature or motivated by an applied problem in statistics, operations research, official statistics or biometrics as well as book reviews. We encourage authors to include an example of a real data set in their manuscripts.