计数数据模型中数据聚合对离散估计的影响。

IF 1.2 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Biostatistics Pub Date : 2021-05-07 DOI:10.1515/ijb-2020-0079

Adam Errington, Jochen Einbeck, Jonathan Cumming, Ute Rössler, David Endesfelder

{"title":"计数数据模型中数据聚合对离散估计的影响。","authors":"Adam Errington, Jochen Einbeck, Jonathan Cumming, Ute Rössler, David Endesfelder","doi":"10.1515/ijb-2020-0079","DOIUrl":null,"url":null,"abstract":"For the modelling of count data, aggregation of the raw data over certain subgroups or predictor configurations is common practice. This is, for instance, the case for count data biomarkers of radiation exposure. Under the Poisson law, count data can be aggregated without loss of information on the Poisson parameter, which remains true if the Poisson assumption is relaxed towards quasi-Poisson. However, in biodosimetry in particular, but also beyond, the question of how the dispersion estimates for quasi-Poisson models behave under data aggregation have received little attention. Indeed, for real data sets featuring unexplained heterogeneities, dispersion estimates can increase strongly after aggregation, an effect which we will demonstrate and quantify explicitly for some scenarios. The increase in dispersion estimates implies an inflation of the parameter standard errors, which, however, by comparison with random effect models, can be shown to serve a corrective purpose. The phenomena are illustrated by γ-H2AX foci data as used for instance in radiation biodosimetry for the calibration of dose-response curves.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"18 1","pages":"183-202"},"PeriodicalIF":1.2000,"publicationDate":"2021-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2020-0079","citationCount":"6","resultStr":"{\"title\":\"The effect of data aggregation on dispersion estimates in count data models.\",\"authors\":\"Adam Errington, Jochen Einbeck, Jonathan Cumming, Ute Rössler, David Endesfelder\",\"doi\":\"10.1515/ijb-2020-0079\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the modelling of count data, aggregation of the raw data over certain subgroups or predictor configurations is common practice. This is, for instance, the case for count data biomarkers of radiation exposure. Under the Poisson law, count data can be aggregated without loss of information on the Poisson parameter, which remains true if the Poisson assumption is relaxed towards quasi-Poisson. However, in biodosimetry in particular, but also beyond, the question of how the dispersion estimates for quasi-Poisson models behave under data aggregation have received little attention. Indeed, for real data sets featuring unexplained heterogeneities, dispersion estimates can increase strongly after aggregation, an effect which we will demonstrate and quantify explicitly for some scenarios. The increase in dispersion estimates implies an inflation of the parameter standard errors, which, however, by comparison with random effect models, can be shown to serve a corrective purpose. The phenomena are illustrated by γ-H2AX foci data as used for instance in radiation biodosimetry for the calibration of dose-response curves.\",\"PeriodicalId\":49058,\"journal\":{\"name\":\"International Journal of Biostatistics\",\"volume\":\"18 1\",\"pages\":\"183-202\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2021-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/ijb-2020-0079\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Biostatistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1515/ijb-2020-0079\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/ijb-2020-0079","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 6

摘要

对于计数数据的建模，通常的做法是在某些子组或预测器配置上聚合原始数据。例如，辐射暴露的计数数据生物标志物就是这种情况。在泊松定律下，计数数据可以在不丢失泊松参数信息的情况下聚合，如果泊松假设放宽为准泊松，则计数数据仍然是正确的。然而，在生物剂量学中，尤其是在其他领域，准泊松模型的离散度估计在数据聚集下的表现如何的问题很少受到关注。事实上，对于具有无法解释的异质性的真实数据集，分散估计在聚合后可能会强烈增加，我们将在某些情况下明确地证明和量化这种效应。分散估计的增加意味着参数标准误差的膨胀，然而，通过与随机效应模型的比较，可以证明这是一种校正目的。用γ-H2AX焦点数据说明了这种现象，例如在辐射生物剂量学中用于校准剂量-响应曲线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The effect of data aggregation on dispersion estimates in count data models.

For the modelling of count data, aggregation of the raw data over certain subgroups or predictor configurations is common practice. This is, for instance, the case for count data biomarkers of radiation exposure. Under the Poisson law, count data can be aggregated without loss of information on the Poisson parameter, which remains true if the Poisson assumption is relaxed towards quasi-Poisson. However, in biodosimetry in particular, but also beyond, the question of how the dispersion estimates for quasi-Poisson models behave under data aggregation have received little attention. Indeed, for real data sets featuring unexplained heterogeneities, dispersion estimates can increase strongly after aggregation, an effect which we will demonstrate and quantify explicitly for some scenarios. The increase in dispersion estimates implies an inflation of the parameter standard errors, which, however, by comparison with random effect models, can be shown to serve a corrective purpose. The phenomena are illustrated by γ-H2AX foci data as used for instance in radiation biodosimetry for the calibration of dose-response curves.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Biostatistics MATHEMATICAL & COMPUTATIONAL BIOLOGY-STATISTICS & PROBABILITY

CiteScore

2.10

自引率

8.30%

发文量

审稿时长

>12 weeks

期刊介绍： The International Journal of Biostatistics (IJB) seeks to publish new biostatistical models and methods, new statistical theory, as well as original applications of statistical methods, for important practical problems arising from the biological, medical, public health, and agricultural sciences with an emphasis on semiparametric methods. Given many alternatives to publish exist within biostatistics, IJB offers a place to publish for research in biostatistics focusing on modern methods, often based on machine-learning and other data-adaptive methodologies, as well as providing a unique reading experience that compels the author to be explicit about the statistical inference problem addressed by the paper. IJB is intended that the journal cover the entire range of biostatistics, from theoretical advances to relevant and sensible translations of a practical problem into a statistical framework. Electronic publication also allows for data and software code to be appended, and opens the door for reproducible research allowing readers to easily replicate analyses described in a paper. Both original research and review articles will be warmly received, as will articles applying sound statistical methods to practical problems.