Fedelis Mutiso, John L. Pearce, Sara E. Benjamin-Neelon, Noel T. Mueller, Hong Li, Brian Neelon
{"title":"A Marginalized Zero-Inflated Negative Binomial Model for Spatial Data: Modeling COVID-19 Deaths in Georgia","authors":"Fedelis Mutiso, John L. Pearce, Sara E. Benjamin-Neelon, Noel T. Mueller, Hong Li, Brian Neelon","doi":"10.1002/bimj.202300182","DOIUrl":"10.1002/bimj.202300182","url":null,"abstract":"<div>\u0000 \u0000 <p>Spatial count data with an abundance of zeros arise commonly in disease mapping studies. Typically, these data are analyzed using zero-inflated models, which comprise a mixture of a point mass at zero and an ordinary count distribution, such as the Poisson or negative binomial. However, due to their mixture representation, conventional zero-inflated models are challenging to explain in practice because the parameter estimates have conditional latent-class interpretations. As an alternative, several authors have proposed marginalized zero-inflated models that simultaneously model the excess zeros and the marginal mean, leading to a parameterization that more closely aligns with ordinary count models. Motivated by a study examining predictors of COVID-19 death rates, we develop a spatiotemporal marginalized zero-inflated negative binomial model that directly models the marginal mean, thus extending marginalized zero-inflated models to the spatial setting. To capture the spatiotemporal heterogeneity in the data, we introduce region-level covariates, smooth temporal effects, and spatially correlated random effects to model both the excess zeros and the marginal mean. For estimation, we adopt a Bayesian approach that combines full-conditional Gibbs sampling and Metropolis–Hastings steps. We investigate features of the model and use the model to identify key predictors of COVID-19 deaths in the US state of Georgia during the 2021 calendar year.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Years of Life Lost to COVID-19 and Related Mortality Indicators: An Illustration in 30 Countries","authors":"Valentin Rousson, Isabella Locatelli","doi":"10.1002/bimj.202300386","DOIUrl":"10.1002/bimj.202300386","url":null,"abstract":"<p>The concept of (potential) years of life lost is a measure of premature mortality that can be used to compare the impacts of different specific causes of death. However, interpreting a given number of years of life lost at face value is more problematic because of the lack of a sensible reference value. In this paper, we propose three denominators to divide an excess years of life lost, thus obtaining three indicators, called <i>average life lost</i>, <i>increase of life lost</i>, and <i>proportion of life lost</i>, which should facilitate interpretation and comparisons. We study the links between these three indicators and classical mortality indicators, such as life expectancy and standardized mortality rate, introduce the concept of <i>weighted standardized mortality rate</i>, and calculate them in 30 countries to assess the impact of COVID-19 on mortality in the year 2020. Using any of the three indicators, a significant excess loss is found for both genders in 18 of the 30 countries.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Regression Techniques for Multiple Method Comparison and Transformation","authors":"Florian Dufey","doi":"10.1002/bimj.202400027","DOIUrl":"10.1002/bimj.202400027","url":null,"abstract":"<p>A generalization of Passing–Bablok regression is proposed for comparing multiple measurement methods simultaneously. Possible applications include assay migration studies or interlaboratory trials. When comparing only two methods, the method boils down to the usual Passing–Bablok estimator. It is close in spirit to reduced major axis regression, which is, however, not robust. To obtain a robust estimator, the major axis is replaced by the (hyper-)spherical median axis. This technique has been applied to compare SARS-CoV-2 serological tests, bilirubin in neonates, and an in vitro diagnostic test using different instruments, sample preparations, and reagent lots. In addition, plots similar to the well-known Bland–Altman plots have been developed to represent the variance structure.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202400027","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camille Frévent, Mohamed-Salem Ahmed, Sophie Dabo-Niang, Michaël Genin
{"title":"A Shared-Frailty Spatial Scan Statistic Model for Time-to-Event Data","authors":"Camille Frévent, Mohamed-Salem Ahmed, Sophie Dabo-Niang, Michaël Genin","doi":"10.1002/bimj.202300200","DOIUrl":"10.1002/bimj.202300200","url":null,"abstract":"<p>Spatial scan statistics are well-known methods widely used to detect spatial clusters of events. Furthermore, several spatial scan statistics models have been applied to the spatial analysis of time-to-event data. However, these models do not take account of potential correlations between the observations of individuals within the same spatial unit or potential spatial dependence between spatial units. To overcome this problem, we have developed a scan statistic based on a Cox model with shared frailty and that takes account of the spatial dependence between spatial units. In simulation studies, we found that (i) conventional models of spatial scan statistics for time-to-event data fail to maintain the type I error in the presence of a correlation between the observations of individuals within the same spatial unit and (ii) our model performed well in the presence of such correlation and spatial dependence. We have applied our method to epidemiological data and the detection of spatial clusters of mortality in patients with end-stage renal disease in northern France.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300200","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141581612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aude Allemang-Trivalle, Annabel Maruani, Bruno Giraudeau
{"title":"Sample Size Calculation for an Individual Stepped-Wedge Randomized Trial","authors":"Aude Allemang-Trivalle, Annabel Maruani, Bruno Giraudeau","doi":"10.1002/bimj.202300167","DOIUrl":"10.1002/bimj.202300167","url":null,"abstract":"<p>In the individual stepped-wedge randomized trial (ISW-RT), subjects are allocated to sequences, each sequence being defined by a control period followed by an experimental period. The total follow-up time is the same for all sequences, but the duration of the control and experimental periods varies among sequences. To our knowledge, there is no validated sample size calculation formula for ISW-RTs unlike stepped-wedge cluster randomized trials (SW-CRTs). The objective of this study was to adapt the formula used for SW-CRTs to the case of individual randomization and to validate this adaptation using a Monte Carlo simulation study. The proposed sample size calculation formula for an ISW-RT design yielded satisfactory empirical power for most scenarios except scenarios with operating characteristic values near the boundary (i.e., smallest possible number of periods, very high or very low autocorrelation coefficient). Overall, the results provide useful insights into the sample size calculation for ISW-RTs.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300167","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141581614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raphael O. Betschart, Cristian Riccio, Domingo Aguilera-Garcia, Stefan Blankenberg, Linlin Guo, Holger Moch, Dagmar Seidl, Hugo Solleder, Felix Thalén, Alexandre Thiéry, Raphael Twerenbold, Tanja Zeller, Martin Zoche, Andreas Ziegler
{"title":"Biostatistical Aspects of Whole Genome Sequencing Studies: Preprocessing and Quality Control","authors":"Raphael O. Betschart, Cristian Riccio, Domingo Aguilera-Garcia, Stefan Blankenberg, Linlin Guo, Holger Moch, Dagmar Seidl, Hugo Solleder, Felix Thalén, Alexandre Thiéry, Raphael Twerenbold, Tanja Zeller, Martin Zoche, Andreas Ziegler","doi":"10.1002/bimj.202300278","DOIUrl":"10.1002/bimj.202300278","url":null,"abstract":"<p>Rapid advances in high-throughput DNA sequencing technologies have enabled large-scale whole genome sequencing (WGS) studies. Before performing association analysis between phenotypes and genotypes, preprocessing and quality control (QC) of the raw sequence data need to be performed. Because many biostatisticians have not been working with WGS data so far, we first sketch Illumina's short-read sequencing technology. Second, we explain the general preprocessing pipeline for WGS studies. Third, we provide an overview of important QC metrics, which are applied to WGS data: on the raw data, after mapping and alignment, after variant calling, and after multisample variant calling. Fourth, we illustrate the QC with the data from the GENEtic SequencIng Study Hamburg–Davos (GENESIS-HD), a study involving more than 9000 human whole genomes. All samples were sequenced on an Illumina NovaSeq 6000 with an average coverage of 35× using a PCR-free protocol. For QC, one genome in a bottle (GIAB) trio was sequenced in four replicates, and one GIAB sample was successfully sequenced 70 times in different runs. Fifth, we provide empirical data on the compression of raw data using the DRAGEN original read archive (ORA). The most important quality metrics in the application were genetic similarity, sample cross-contamination, deviations from the expected Het/Hom ratio, relatedness, and coverage. The compression ratio of the raw files using DRAGEN ORA was 5.6:1, and compression time was linear by genome coverage. In summary, the preprocessing, joint calling, and QC of large WGS studies are feasible within a reasonable time, and efficient QC procedures are readily available.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141581613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyuan Guo, Jiajia Zhang, Yichao Wu, Alexander C. McLain, James W. Hardin, Bankole Olatosi, Xiaoming Li
{"title":"Functional Multivariable Logistic Regression With an Application to HIV Viral Suppression Prediction","authors":"Siyuan Guo, Jiajia Zhang, Yichao Wu, Alexander C. McLain, James W. Hardin, Bankole Olatosi, Xiaoming Li","doi":"10.1002/bimj.202300081","DOIUrl":"10.1002/bimj.202300081","url":null,"abstract":"<p>Motivated by improving the prediction of the human immunodeficiency virus (HIV) suppression status using electronic health records (EHR) data, we propose a functional multivariable logistic regression model, which accounts for the longitudinal binary process and continuous process simultaneously. Specifically, the longitudinal measurements for either binary or continuous variables are modeled by functional principal components analysis, and their corresponding functional principal component scores are used to build a logistic regression model for prediction. The longitudinal binary data are linked to underlying Gaussian processes. The estimation is done using penalized spline for the longitudinal continuous and binary data. Group-lasso is used to select longitudinal processes, and the multivariate functional principal components analysis is proposed to revise functional principal component scores with the correlation. The method is evaluated via comprehensive simulation studies and then applied to predict viral suppression using EHR data for people living with HIV in South Carolina.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300081","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141536065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining Partial True Discovery Guarantee Procedures","authors":"Ningning Xu, Aldo Solari, Jelle J. Goeman","doi":"10.1002/bimj.202300075","DOIUrl":"10.1002/bimj.202300075","url":null,"abstract":"<p>Closed testing has recently been shown to be optimal for simultaneous true discovery proportion control. It is, however, challenging to construct true discovery guarantee procedures in such a way that it focuses power on some feature sets chosen by users based on their specific interest or expertise. We propose a procedure that allows users to target power on prespecified feature sets, that is, “focus sets.” Still, the method also allows inference for feature sets chosen post hoc, that is, “nonfocus sets,” for which we deduce a true discovery lower confidence bound by interpolation. Our procedure is built from partial true discovery guarantee procedures combined with Holm's procedure and is a conservative shortcut to the closed testing procedure. A simulation study confirms that the statistical power of our method is relatively high for focus sets, at the cost of power for nonfocus sets, as desired. In addition, we investigate its power property for sets with specific structures, for example, trees and directed acyclic graphs. We also compare our method with AdaFilter in the context of replicability analysis. The application of our method is illustrated with a gene ontology analysis in gene expression data.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300075","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141494375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sören Budig, Klaus Jung, Mario Hasler, Frank Schaarschmidt
{"title":"Simultaneous Inference of Multiple Binary Endpoints in Biomedical Research: Small Sample Properties of Multiple Marginal Models and a Resampling Approach","authors":"Sören Budig, Klaus Jung, Mario Hasler, Frank Schaarschmidt","doi":"10.1002/bimj.202300197","DOIUrl":"10.1002/bimj.202300197","url":null,"abstract":"<p>In biomedical research, the simultaneous inference of multiple binary endpoints may be of interest. In such cases, an appropriate multiplicity adjustment is required that controls the family-wise error rate, which represents the probability of making incorrect test decisions. In this paper, we investigate two approaches that perform single-step <span></span><math>\u0000 <semantics>\u0000 <mi>p</mi>\u0000 <annotation>$p$</annotation>\u0000 </semantics></math>-value adjustments that also take into account the possible correlation between endpoints. A rather novel and flexible approach known as multiple marginal models is considered, which is based on stacking of the parameter estimates of the marginal models and deriving their joint asymptotic distribution. We also investigate a nonparametric vector-based resampling approach, and we compare both approaches with the Bonferroni method by examining the family-wise error rate and power for different parameter settings, including low proportions and small sample sizes. The results show that the resampling-based approach consistently outperforms the other methods in terms of power, while still controlling the family-wise error rate. The multiple marginal models approach, on the other hand, shows a more conservative behavior. However, it offers more versatility in application, allowing for more complex models or straightforward computation of simultaneous confidence intervals. The practical application of the methods is demonstrated using a toxicological dataset from the National Toxicology Program.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300197","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141494376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}