Biometrics最新文献_第2页

Using model-assisted calibration methods to improve efficiency of regression analyses using two-phase samples or pooled samples under complex survey designs. 利用模型辅助校准方法提高复杂调查设计下两相样本或混合样本回归分析的效率。

IF 1.7 4区数学

Biometrics Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf092

Lingxiao Wang

{"title":"Using model-assisted calibration methods to improve efficiency of regression analyses using two-phase samples or pooled samples under complex survey designs.","authors":"Lingxiao Wang","doi":"10.1093/biomtc/ujaf092","DOIUrl":"10.1093/biomtc/ujaf092","url":null,"abstract":"Two-phase sampling designs are frequently applied in epidemiological studies and large-scale health surveys. In such designs, certain variables are collected exclusively within a second-phase random subsample of the initial first-phase sample, often due to factors such as high costs, response burden, or constraints on data collection or assessment. Consequently, second-phase sample estimators can be inefficient due to the diminished sample size. Model-assisted calibration methods have been used to improve the efficiency of second-phase estimators in regression analysis. However, limited literature provides valid finite population inferences of the calibration estimators that use appropriate calibration auxiliary variables while simultaneously accounting for the complex sample designs in the first- and second-phase samples. Moreover, no literature considers the \"pooled design\" where some covariates are measured exclusively in certain repeated survey cycles. This paper proposes calibrating the sample weights for the second-phase sample to the weighted first-phase sample based on score functions of the regression model that uses predictions of the second-phase variable for the first-phase sample. We establish the consistency of estimation using calibrated weights and provide variance estimation for the regression coefficients under the two-phase design or the pooled design nested within complex survey designs. Empirical evidence highlights the efficiency and robustness of the proposed calibration compared to existing calibration and imputation methods. Data examples from the National Health and Nutrition Examination Survey are provided.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288669/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mastering rare event analysis: subsample-size determination in Cox and logistic regressions. 掌握罕见事件分析：在Cox和逻辑回归中确定子样本大小。

IF 1.7 4区数学

Biometrics Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf110

Tal Agassi, Nir Keret, Malka Gorfine

引用次数: 0

Cumulative incidence function estimation using population-based biobank data. 基于种群的生物样本库数据的累积关联函数估计。

IF 1.7 4区数学

Biometrics Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf049

Malka Gorfine, David M Zucker, Shoval Shoham

引用次数: 0

Statistical significance of clustering for count data. 计数数据聚类的统计显著性。

IF 1.7 4区数学

Biometrics Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf120

Yifan Dai, Di Wu, Yufeng Liu

{"title":"Statistical significance of clustering for count data.","authors":"Yifan Dai, Di Wu, Yufeng Liu","doi":"10.1093/biomtc/ujaf120","DOIUrl":"10.1093/biomtc/ujaf120","url":null,"abstract":"Clustering is widely used in biomedical research for meaningful subgroup identification. However, most existing clustering algorithms do not account for the statistical uncertainty of the resulting clusters and consequently may generate spurious clusters due to natural sampling variation. To address this problem, the Statistical Significance of Clustering (SigClust) method was developed to evaluate the significance of clusters in high-dimensional data. While SigClust has been successful in assessing clustering significance for continuous data, it is not specifically designed for discrete data, such as count data in genomics. Moreover, SigClust and its variations can suffer from reduced statistical power when applied to non-Gaussian high-dimensional data. To overcome these limitations, we propose SigClust-DEV, a method designed to evaluate the significance of clusters in count data. Through extensive simulations, we compare SigClust-DEV against other existing SigClust approaches across various count distributions and demonstrate its superior performance. Furthermore, we apply our proposed SigClust-DEV to Hydra single-cell RNA sequencing (scRNA) data and electronic health records (EHRs) of cancer patients to identify meaningful latent cell types and patient subgroups, respectively.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448855/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improved prediction and flagging of extreme random effects for non-Gaussian outcomes using weighted methods. 使用加权方法改进非高斯结果的极端随机效应的预测和标记。

IF 1.7 4区数学

Biometrics Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf094

John Neuhaus, Charles McCulloch, Ross Boylan

{"title":"Improved prediction and flagging of extreme random effects for non-Gaussian outcomes using weighted methods.","authors":"John Neuhaus, Charles McCulloch, Ross Boylan","doi":"10.1093/biomtc/ujaf094","DOIUrl":"10.1093/biomtc/ujaf094","url":null,"abstract":"Investigators often focus on predicting extreme random effects from mixed effects models fitted to longitudinal or clustered data, and on identifying or \"flagging\" outliers such as poorly performing hospitals or rapidly deteriorating patients. Our recent work with Gaussian outcomes showed that weighted prediction methods can substantially reduce mean square error of prediction for extremes and substantially increase correct flagging rates compared to previous methods, while controlling the incorrect flagging rates. This paper extends the weighted prediction methods to non-Gaussian outcomes such as binary and count data. Closed-form expressions for predicted random effects and probabilities of correct and incorrect flagging are not available for the usual non-Gaussian outcomes, and the computational challenges are substantial. Therefore, our results include the development of theory to support algorithms that tune predictors that we call \"self-calibrated\" (which control the incorrect flagging rate using very simple flagging rules) and innovative numerical methods to calculate weighted predictors as well as to evaluate their performance. Comprehensive numerical evaluations show that the novel weighted predictors for non-Gaussian outcomes have substantially lower mean square error of prediction at the extremes and considerably higher correct flagging rates than previously proposed methods, while controlling the incorrect flagging rates. We illustrate our new methods using data on emergency room readmissions for children with asthma.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309285/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144741072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A monotone single index model for spatially referenced multistate current status data. 空间引用多状态电流状态数据的单调单索引模型。

IF 1.7 4区数学

Biometrics Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf105

Snigdha Das, Minwoo Chae, Debdeep Pati, Dipankar Bandyopadhyay

{"title":"A monotone single index model for spatially referenced multistate current status data.","authors":"Snigdha Das, Minwoo Chae, Debdeep Pati, Dipankar Bandyopadhyay","doi":"10.1093/biomtc/ujaf105","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf105","url":null,"abstract":"Assessment of multistate disease progression is commonplace in biomedical research, such as in periodontal disease (PD). However, the presence of multistate current status endpoints, where only a single snapshot of each subject's progression through disease states is available at a random inspection time after a known starting state, complicates the inferential framework. In addition, these endpoints can be clustered, and spatially associated, where a group of proximally located teeth (within subjects) may experience similar PD status, compared to those distally located. Motivated by a clinical study recording PD progression, we propose a Bayesian semiparametric accelerated failure time model with an inverse-Wishart proposal for accommodating (spatial) random effects, and flexible errors that follow a Dirichlet process mixture of Gaussians. For clinical interpretability, the systematic component of the event times is modeled using a monotone single index model, with the (unknown) link function estimated via a novel integrated basis expansion and basis coefficients endowed with constrained Gaussian process priors. In addition to establishing parameter identifiability, we present scalable computing via a combination of elliptical slice sampling, fast circulant embedding techniques, and smoothing of hard constraints, leading to straightforward estimation of parameters, and state occupation and transition probabilities. Using synthetic data, we study the finite sample properties of our Bayesian estimates and their performance under model misspecification. We also illustrate our method via application to the real clinical PD dataset.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12391879/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Simple simulation based reconstruction of incidence rates from death data. 基于死亡数据的简单模拟的发病率重建。

IF 1.4 4区数学

Biometrics Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf088

Simon N Wood

引用次数: 0

Smooth and shape-constrained quantile distributed lag models. 光滑和形状约束的分位数分布滞后模型。

IF 1.7 4区数学

Biometrics Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf101

Yisen Jin, Aaron J Molstad, Ander Wilson, Joseph Antonelli

{"title":"Smooth and shape-constrained quantile distributed lag models.","authors":"Yisen Jin, Aaron J Molstad, Ander Wilson, Joseph Antonelli","doi":"10.1093/biomtc/ujaf101","DOIUrl":"10.1093/biomtc/ujaf101","url":null,"abstract":"Exposure to environmental pollutants during the gestational period can significantly impact infant health outcomes, such as birth weight and neurological development. Identifying critical windows of susceptibility, which are specific periods during pregnancy when exposure has the most profound effects, is essential for developing targeted interventions. Distributed lag models (DLMs) are widely used in environmental epidemiology to analyze the temporal patterns of exposure and their impact on health outcomes. However, traditional DLMs focus on modeling the conditional mean, which may fail to capture heterogeneity in the relationship between predictors and the outcome. Moreover, when modeling the distribution of health outcomes like gestational birth weight, it is the extreme quantiles that are of most clinical relevance. We introduce 2 new quantile distributed lag model (QDLM) estimators designed to address the limitations of existing methods by leveraging smoothness and shape constraints, such as unimodality and concavity, to enhance interpretability and efficiency. We apply our QDLM estimators to the Colorado birth cohort data, demonstrating their effectiveness in identifying critical windows of susceptibility and informing public health interventions.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12381565/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tree-based additive noise directed acyclic graphical models for nonlinear causal discovery with interactions. 具有相互作用的非线性因果发现的树型加性噪声有向无环图模型。

IF 1.7 4区数学

Biometrics Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf089

Fangting Zhou, Kejun He, Yang Ni

{"title":"Tree-based additive noise directed acyclic graphical models for nonlinear causal discovery with interactions.","authors":"Fangting Zhou, Kejun He, Yang Ni","doi":"10.1093/biomtc/ujaf089","DOIUrl":"10.1093/biomtc/ujaf089","url":null,"abstract":"Directed acyclic graphical models with additive noises are essential in nonlinear causal discovery and have numerous applications in various domains, such as social science and systems biology. Most such models further assume that structural causal functions are additive to ensure causal identifiability and computational feasibility, which may be too restrictive in the presence of causal interactions. Some methods consider general nonlinear causal functions represented by, for example, Gaussian processes and neural networks, to accommodate interactions. However, they are either computationally intensive or lack interpretability. We propose a highly interpretable and computationally feasible approach using trees to incorporate interactions in nonlinear causal discovery, termed tree-based additive noise models. The nature of the tree construction leads to piecewise constant causal functions, making existing causal identifiability results of additive noise models with continuous and smooth causal functions inapplicable. Therefore, we provide new conditions under which the proposed model is identifiable. We develop a recursive algorithm for source node identification and a score-based ordering search algorithm. Through extensive simulations, we demonstrate the utility of the proposed model and algorithms benchmarking against existing additive noise models, especially when there are strong causal interactions. Our method is applied to infer a protein-protein interaction network for breast cancer, where proteins may form protein complexes to perform their functions.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288665/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Valid and efficient inference for nonparametric variable importance in two-phase studies. 两阶段研究中非参数变量重要性的有效推断。

IF 1.7 4区数学

Biometrics Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf095

Guorong Dai, Raymond J Carroll, Jinbo Chen

{"title":"Valid and efficient inference for nonparametric variable importance in two-phase studies.","authors":"Guorong Dai, Raymond J Carroll, Jinbo Chen","doi":"10.1093/biomtc/ujaf095","DOIUrl":"10.1093/biomtc/ujaf095","url":null,"abstract":"We consider a common nonparametric regression setting, where the data consist of a response variable Y, some easily obtainable covariates $mathbf {X}$, and a set of costly covariates $mathbf {Z}$. Before establishing predictive models for Y, a natural question arises: Is it worthwhile to include $mathbf {Z}$ as predictors, given the additional cost of collecting data on $mathbf {Z}$ for both training the models and predicting Y for future individuals? Therefore, we aim to conduct preliminary investigations to infer importance of $mathbf {Z}$ in predicting Y in the presence of $mathbf {X}$. To achieve this goal, we propose a nonparametric variable importance measure for $mathbf {Z}$. It is defined as a parameter that aggregates maximum potential contributions of $mathbf {Z}$ in single or multiple predictive models, with contributions quantified by general loss functions. Considering two-phase data that provide a large number of observations for $(Y,mathbf {X})$ with the expensive $mathbf {Z}$ measured only in a small subsample, we develop a novel approach to infer the proposed importance measure, accommodating missingness of $mathbf {Z}$ in the sample by substituting functions of $(Y,mathbf {X})$ for each individual's contribution to the predictive loss of models involving $mathbf {Z}$. Our approach attains unified and efficient inference regardless of whether $mathbf {Z}$ makes zero or positive contribution to predicting Y, a desirable yet surprising property owing to data incompleteness. As intermediate steps of our theoretical development, we establish novel results in two relevant research areas, semi-supervised inference and two-phase nonparametric estimation. Numerical results from both simulated and real data demonstrate superior performance of our approach.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144752245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0