Annals of Applied Statistics最新文献

筛选
英文 中文
KULLBACK-LEIBLER-BASED DISCRETE FAILURE TIME MODELS FOR INTEGRATION OF PUBLISHED PREDICTION MODELS WITH NEW TIME-TO-EVENT DATASET. 基于kullback - leibler的离散故障时间模型,用于集成已发布的预测模型和新的时间到事件数据集。
IF 1.4 4区 数学
Annals of Applied Statistics Pub Date : 2025-06-01 Epub Date: 2025-05-28 DOI: 10.1214/24-aoas1955
Di Wang, Wen Ye, Randall Sung, Hui Jiang, Jeremy M G Taylor, Lisa Ly, Kevin He
{"title":"KULLBACK-LEIBLER-BASED DISCRETE FAILURE TIME MODELS FOR INTEGRATION OF PUBLISHED PREDICTION MODELS WITH NEW TIME-TO-EVENT DATASET.","authors":"Di Wang, Wen Ye, Randall Sung, Hui Jiang, Jeremy M G Taylor, Lisa Ly, Kevin He","doi":"10.1214/24-aoas1955","DOIUrl":"10.1214/24-aoas1955","url":null,"abstract":"<p><p>Prediction of time-to-event data often suffers from rare event rates, small sample sizes, high dimensionality, and low signal-to-noise ratios. Incorporating published prediction models from external large-scale studies is expected to improve the performance of prognosis prediction from internal individual-level data. However, existing integration approaches typically assume that the underlying distributions of the external and internal data sources are similar, which is often invalid. To account for challenges, including heterogeneity, data sharing, and privacy constraints, we propose a failure time integration procedure, which utilizes a discrete hazard-based Kullback-Leibler discriminatory information measuring the discrepancy between the external models and the internal dataset. The asymptotic properties and simulation results show the advantage of the proposed method compared to those solely based on internal data. We apply the proposed method to improve prediction performance on a kidney transplant dataset from a local hospital by integrating this small-sized dataset with a published survival model obtained from the national transplant registry.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 2","pages":"1167-1189"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12797872/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A DEEP NEURAL NETWORK TWO-PART MODEL AND FEATURE IMPORTANCE TEST FOR SEMI-CONTINUOUS DATA. 半连续数据的深度神经网络两部分模型及特征重要性检验。
IF 1.4 4区 数学
Annals of Applied Statistics Pub Date : 2025-06-01 Epub Date: 2025-05-28 DOI: 10.1214/25-aoas2013
Baiming Zou, Xinlei Mi, Shiyu Wan, Di Wu, James G Xenakis, Jianhua Hu, Fei Zou
{"title":"A DEEP NEURAL NETWORK TWO-PART MODEL AND FEATURE IMPORTANCE TEST FOR SEMI-CONTINUOUS DATA.","authors":"Baiming Zou, Xinlei Mi, Shiyu Wan, Di Wu, James G Xenakis, Jianhua Hu, Fei Zou","doi":"10.1214/25-aoas2013","DOIUrl":"10.1214/25-aoas2013","url":null,"abstract":"<p><p>Semi-continuous data frequently arise in clinical practice. For example, while many surgical patients still suffer from varying degrees of acute postoperative pain (POP) sometime after surgery (i.e., POP score > 0), others experience none (i.e., POP score = 0), indicating the existence of two distinct data processes at play. Existing parametric or semi-parametric two-part modeling methods for this type of semi-continuous data can fail to appropriately model the two underlying data processes as such methods rely heavily on (generalized) linear additive assumptions. However, many factors may interact to jointly influence the experience of POP non-additively and non-linearly. Motivated by this challenge and inspired by the flexibility of deep neural networks (DNN) to accurately approximate complex functions universally, we derive a DNN-based two-part model by adapting the conventional DNN methods with two additional components: a bootstrapping procedure along with a filtering algorithm to boost the stability of the conventional DNN, an approach we denote as sDNN. To improve the interpretability and transparency of sDNN, we further derive a feature importance testing procedure to identify important features associated with the outcome measurements of the two data processes, denoting this approach fsDNN. We show that fsDNN not only offers a statistical inference procedure for each feature under complex association but also that using the identified features can further improve the predictive performance of sDNN. The proposed sDNN- and fsDNN-based two-part models are applied to the analysis of real data from a POP study, in which application they clearly demonstrate advantages over the existing parametric and semi-parametric two-part models. Further, we conduct extensive numerical studies and draw comparisons with other machine learning methods to demonstrate that sDNN and fsDNN consistently outperform the existing two-part models and frequently used machine learning methods regardless of the data complexity. An R package implementing the proposed methods has been developed and is available in the Supplementary Material (Zou et al, 2025) and is also deposited on GitHub (https://github.com/BZou-lab/fsDNN).</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 2","pages":"1314-1331"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12263096/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144644080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAYESIAN DATA AUGMENTATION FOR RECURRENT EVENTS UNDER INTERMITTENT ASSESSMENT IN OVERLAPPING INTERVALS WITH APPLICATIONS TO EMR DATA. 重复事件在重叠区间间歇评估下的贝叶斯数据增强与emr数据的应用。
IF 1.4 4区 数学
Annals of Applied Statistics Pub Date : 2025-06-01 Epub Date: 2025-05-28 DOI: 10.1214/24-aoas2007
Xin Liu, Patrick M Schnell
{"title":"BAYESIAN DATA AUGMENTATION FOR RECURRENT EVENTS UNDER INTERMITTENT ASSESSMENT IN OVERLAPPING INTERVALS WITH APPLICATIONS TO EMR DATA.","authors":"Xin Liu, Patrick M Schnell","doi":"10.1214/24-aoas2007","DOIUrl":"10.1214/24-aoas2007","url":null,"abstract":"<p><p>Electronic medical records (EMR) data contain rich information that can facilitate health-related studies but is collected primarily for purposes other than research. For recurrent events, EMR data often do not record event times or counts but only contain intermittently assessed and censored observations (i.e. upper and/or lower bounds for counts in a time interval) at uncontrolled times. This can result in non-contiguous or overlapping assessment intervals with censored event counts. Existing methods for analyzing intermittently assessed recurrent events assume disjoint assessment intervals with known counts (interval count data) due to a focus on prospective studies with controlled assessment times. We propose a Bayesian data augmentation method to analyze the complicated assessments in EMR data for recurrent events. Within a Gibbs sampler, event times are imputed by generating sets of event times from non-homogeneous Poisson processes and rejecting proposed sets that are incompatible with constraints imposed by assessment data. Based on the independent increments property of Poisson processes, we implement three techniques to speed up this rejection sampling imputation method for large EMR datasets: independent sampling by partitioning, truncated generation, and sequential sampling. In a simulation study we show our method accurately estimates parameters of log-linear Poisson process intensities. Although the proposed method can be applied generally to EMR data of recurrent events, our study is specifically motivated by identifying risk factors for falls due to cancer treatment and its supportive medications. We used the proposed method to analyze an EMR dataset comprising 5501 patients treated for breast cancer. Our analysis provides evidence supporting associations between certain risk factors (including classes of medications) and risk of falls.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 2","pages":"1332-1361"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393837/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DYNAMIC PREDICTION WITH MULTIVARIATE LONGITUDINAL OUTCOMES AND LONGITUDINAL MAGNETIC RESONANCE IMAGING DATA. 动态预测与多元纵向结果和纵向磁共振成像数据。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2025-03-01 Epub Date: 2025-03-17 DOI: 10.1214/24-aoas1970
Haotian Zou, Luo Xiao, Donglin Zeng, Sheng Luo
{"title":"DYNAMIC PREDICTION WITH MULTIVARIATE LONGITUDINAL OUTCOMES AND LONGITUDINAL MAGNETIC RESONANCE IMAGING DATA.","authors":"Haotian Zou, Luo Xiao, Donglin Zeng, Sheng Luo","doi":"10.1214/24-aoas1970","DOIUrl":"10.1214/24-aoas1970","url":null,"abstract":"<p><p>Alzheimer's Disease (AD) is a common neurodegenerative disorder impairing multiple domains. Recent AD studies, for example, the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, collect multimodal data to better understand AD severity and progression. To facilitate precision medicine for high-risk individuals, it is essential to develop an AD predictive model that leverages multimodal data and provides accurate personalized predictions of dementia occurrences. In this article we propose a multivariate functional mixed model with longitudinal magnetic resonance imaging data (MFMM-LMRI) that jointly models longitudinal neurological scores, longitudinal voxelwise MRI data, and the survival outcome as dementia onset. We model longitudinal MRI data using the joint and individual variation explained (JIVE) approach. We investigate two functional forms linking the longitudinal and survival processes. We adopt the Markov chain Monte Carlo (MCMC) method to obtain posterior samples. We establish a dynamic prediction framework that predicts longitudinal trajectories and the probability of dementia occurrence. The simulation study with various sample sizes and event rates supports the validity of the method. We apply the MFMM-LMRI to the motivating ADNI study and conclude that additional ApoE-<i>ϵ</i>4 alleles and a higher latent disease profile are associated with a higher risk of dementia onset. We detect a significant association between the longitudinal MRI data and the survival outcome. The instantaneous model with longitudinal MRI data has the best fitting and predictive performance.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 1","pages":"505-528"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206078/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144530914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LOW-RANK LONGITUDINAL FACTOR REGRESSION WITH APPLICATION TO CHEMICAL MIXTURES. 低秩纵向因子回归及其在化学混合物中的应用。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2025-03-01 Epub Date: 2025-03-17 DOI: 10.1214/24-aoas1988
Glenn Palmer, Amy H Herring, David B Dunson
{"title":"LOW-RANK LONGITUDINAL FACTOR REGRESSION WITH APPLICATION TO CHEMICAL MIXTURES.","authors":"Glenn Palmer, Amy H Herring, David B Dunson","doi":"10.1214/24-aoas1988","DOIUrl":"https://doi.org/10.1214/24-aoas1988","url":null,"abstract":"<p><p>Developmental epidemiology commonly focuses on assessing the association between multiple early life exposures and childhood health. Statistical analyses of data from such studies focus on inferring the contributions of individual exposures, while also characterizing time-varying and interacting effects. Such inferences are made more challenging by correlations among exposures, nonlinearity, and the curse of dimensionality. Motivated by studying the effects of prenatal bisphenol A (BPA) and phthalate exposures on glucose metabolism in adolescence using data from the ELEMENT study, we propose a low-rank longitudinal factor regression (LowFR) model for tractable inference on flexible longitudinal exposure effects. LowFR handles highly-correlated exposures using a Bayesian dynamic factor model, which is fit jointly with a health outcome via a novel factor regression approach. The model collapses on simpler and intuitive submodels when appropriate, while expanding to allow considerable flexibility in time-varying and interaction effects when supported by the data. After demonstrating LowFR's effectiveness in simulations, we use it to analyze the ELEMENT data and find that diethyl and dibutyl phthalate metabolite levels in trimesters 1 and 2 are associated with altered glucose metabolism in adolescence.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 1","pages":"769-797"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12013532/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144057647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
INFERRING SYNERGISTIC AND ANTAGONISTIC INTERACTIONS IN MIXTURES OF EXPOSURES. 推断暴露混合物中的协同和拮抗相互作用。
IF 1.4 4区 数学
Annals of Applied Statistics Pub Date : 2025-03-01 Epub Date: 2025-03-17 DOI: 10.1214/24-aoas1948
Shounak Chattopadhyay, Stephanie M Engel, David Dunson
{"title":"INFERRING SYNERGISTIC AND ANTAGONISTIC INTERACTIONS IN MIXTURES OF EXPOSURES.","authors":"Shounak Chattopadhyay, Stephanie M Engel, David Dunson","doi":"10.1214/24-aoas1948","DOIUrl":"10.1214/24-aoas1948","url":null,"abstract":"<p><p>There is abundant interest in assessing the joint effects of multiple exposures on human health. This is often referred to as the mixtures problem in environmental epidemiology and toxicology. Classically, studies have examined the adverse health effects of different chemicals one at a time, but there is concern that certain chemicals may act together to amplify each other's effects. Such amplification is referred to as <i>synergistic</i> interaction, while chemicals that inhibit each other's effects have <i>antagonistic</i> interactions. Current approaches for assessing the health effects of chemical mixtures do not explicitly consider synergy or antagonism in the modeling, instead focusing on either parametric or unconstrained nonparametric dose response surface modeling. The parametric case can be too inflexible, while nonparametric methods face a curse of dimensionality that leads to overly wiggly and uninterpretable surface estimates. We propose a Bayesian approach that decomposes the response surface into additive main effects and pairwise interaction effects and then detects synergistic and antagonistic interactions. Variable selection decisions for each interaction component are also provided. This Synergistic Antagonistic Interaction Detection (SAID) framework is evaluated relative to existing approaches using simulation experiments and an application to data from NHANES.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 1","pages":"169-190"},"PeriodicalIF":1.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HETEROGENEOUS TREATMENT AND SPILLOVER EFFECTS UNDER CLUSTERED NETWORK INTERFERENCE. 集群网络干扰下的异质性处理与溢出效应。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2025-03-01 Epub Date: 2025-03-17 DOI: 10.1214/24-aoas1913
Falco J Bargagli-Stoffi, Costanza Tortú, Laura Forastiere
{"title":"HETEROGENEOUS TREATMENT AND SPILLOVER EFFECTS UNDER CLUSTERED NETWORK INTERFERENCE.","authors":"Falco J Bargagli-Stoffi, Costanza Tortú, Laura Forastiere","doi":"10.1214/24-aoas1913","DOIUrl":"10.1214/24-aoas1913","url":null,"abstract":"<p><p>The bulk of causal inference studies rule out the presence of interference between units. However, in many real-world scenarios, units are interconnected by social, physical, or virtual ties, and the effect of the treatment can spill from one unit to other connected individuals in the network. In this paper, we develop a machine learning method that uses tree-based algorithms and a Horvitz-Thompson estimator to assess the heterogeneity of treatment and spillover effects with respect to individual, neighborhood, and network characteristics in the context of clustered networks and interference within clusters. The proposed network causal tree (NCT) algorithm has several advantages. First, it allows the investigation of the heterogeneity of the treatment effect, avoiding potential bias due to the presence of interference. Second, understanding the heterogeneity of both treatment and spillover effects can guide policymakers in scaling up interventions, designing targeting strategies, and increasing cost-effectiveness. We investigate the performance of our NCT method using a Monte Carlo simulation study and illustrate its application to assess the heterogeneous effects of information sessions on the uptake of a new weather insurance policy in rural China.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 1","pages":"28-55"},"PeriodicalIF":1.3,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12245184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144610248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAUSAL HEALTH IMPACTS OF POWER PLANT EMISSION CONTROLS UNDER MODELED AND UNCERTAIN PHYSICAL PROCESS INTERFERENCE. 模拟和不确定物理过程干扰下电厂排放控制的因果健康影响。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-aoas1904
Nathan B Wikle, Corwin M Zigler
{"title":"CAUSAL HEALTH IMPACTS OF POWER PLANT EMISSION CONTROLS UNDER MODELED AND UNCERTAIN PHYSICAL PROCESS INTERFERENCE.","authors":"Nathan B Wikle, Corwin M Zigler","doi":"10.1214/24-aoas1904","DOIUrl":"10.1214/24-aoas1904","url":null,"abstract":"<p><p>Causal inference with spatial environmental data is often challenging due to the presence of interference: outcomes for observational units depend on some combination of local and nonlocal treatment. This is especially relevant when estimating the effect of power plant emissions controls on population health, as pollution exposure is dictated by: (i) the location of point-source emissions as well as (ii) the transport of pollutants across space via dynamic physical-chemical processes. In this work we estimate the effectiveness of air quality interventions at coal-fired power plants in reducing two adverse health outcomes in Texas in 2016: pediatric asthma ED visits and Medicare all-cause mortality. We develop methods for causal inference with interference when the underlying network structure is not known with certainty and instead must be estimated from ancillary data. Notably, uncertainty in the interference structure is propagated to the resulting causal effect estimates. We offer a Bayesian, spatial mechanistic model for the interference mapping, which we combine with a flexible nonparametric outcome model to marginalize estimates of causal effects over uncertainty in the structure of interference. our analysis finds some evidence that emissions controls at upwind power plants reduce asthma ED visits and all-cause mortality; however, accounting for uncertainty in the interference renders the results largely inconclusive.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"2753-2774"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11619076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142787678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A LATENT VARIABLE MIXTURE MODEL FOR COMPOSITION-ON-COMPOSITION REGRESSION WITH APPLICATION TO CHEMICAL RECYCLING. 成分-成分回归的潜在变量混合模型及其在化工回收中的应用。
IF 1.4 4区 数学
Annals of Applied Statistics Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-aoas1935
Nicholas Rios, Lingzhou Xue, Xiang Zhan
{"title":"A LATENT VARIABLE MIXTURE MODEL FOR COMPOSITION-ON-COMPOSITION REGRESSION WITH APPLICATION TO CHEMICAL RECYCLING.","authors":"Nicholas Rios, Lingzhou Xue, Xiang Zhan","doi":"10.1214/24-aoas1935","DOIUrl":"10.1214/24-aoas1935","url":null,"abstract":"<p><p>It is quite common to encounter compositional data in a regression framework in data analysis. When both responses and predictors are compositional, most existing models rely on a family of log-ratio based transformations to move the analysis from the simplex to the reals. This often makes the interpretation of the model more complex. A transformation-free regression model was recently developed, but it only allows for a single compositional predictor. However, many datasets include multiple compositional predictors of interest. Motivated by an application to hydrothermal liquefaction (HTL) data, a novel extension of this transformation-free regression model is provided that allows for two (or more) compositional predictors to be used via a latent variable mixture. A modified expectation-maximization algorithm is proposed to estimate model parameters, which are shown to have natural interpretations. Conformal inference is used to obtain prediction limits on the compositional response. The resulting methodology is applied to the HTL dataset. Extensions to multiple predictors are discussed.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"3253-3273"},"PeriodicalIF":1.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A SEMIPARAMETRIC METHOD FOR RISK PREDICTION USING INTEGRATED ELECTRONIC HEALTH RECORD DATA. 综合电子病历数据风险预测的半参数方法。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI: 10.1214/24-AOAS1938
Jill Hasler, Yanyuan Ma, Yizheng Wei, Ravi Parikh, Jinbo Chen
{"title":"A SEMIPARAMETRIC METHOD FOR RISK PREDICTION USING INTEGRATED ELECTRONIC HEALTH RECORD DATA.","authors":"Jill Hasler, Yanyuan Ma, Yizheng Wei, Ravi Parikh, Jinbo Chen","doi":"10.1214/24-AOAS1938","DOIUrl":"10.1214/24-AOAS1938","url":null,"abstract":"<p><p>When using electronic health records (EHRs) for clinical and translational research, additional data is often available from external sources to enrich the information extracted from EHRs. For example, academic biobanks have more granular data available, and patient reported data is often collected through small-scale surveys. It is common that the external data is available only for a small subset of patients who have EHR information. We propose efficient and robust methods for building and evaluating models for predicting the risk of binary outcomes using such integrated EHR data. Our method is built upon an idea derived from the two-phase design literature that modeling the availability of a patient's external data as a function of an EHR-based preliminary predictive score leads to effective utilization of the EHR data. Through both theoretical and simulation studies, we show that our method has high efficiency for estimating log-odds ratio parameters, the area under the ROC curve, as well as other measures for quantifying predictive accuracy. We apply our method to develop a model for predicting the short-term mortality risk of oncology patients, where the data was extracted from the University of Pennsylvania hospital system EHR and combined with survey-based patient reported outcome data.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"3318-3337"},"PeriodicalIF":1.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934126/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143711932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书