Annals of Applied Statistics最新文献

筛选
英文 中文
TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH. 针对精准医疗中代表性不足的人群:一种联合转移学习方法。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2023-12-01 Epub Date: 2023-10-30 DOI: 10.1214/23-AOAS1747
By Sai Li, Tianxi Cai, Rui Duan
{"title":"TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH.","authors":"By Sai Li, Tianxi Cai, Rui Duan","doi":"10.1214/23-AOAS1747","DOIUrl":"10.1214/23-AOAS1747","url":null,"abstract":"<p><p>The limited representation of minorities and disadvantaged populations in large-scale clinical and genomics research poses a significant barrier to translating precision medicine research into practice. Prediction models are likely to underperform in underrepresented populations due to heterogeneity across populations, thereby exacerbating known health disparities. To address this issue, we propose FETA, a two-way data integration method that leverages a federated transfer learning approach to integrate heterogeneous data from diverse populations and multiple healthcare institutions, with a focus on a target population of interest having limited sample sizes. We show that FETA achieves performance comparable to the pooled analysis, where individual-level data is shared across institutions, with only a small number of communications across participating sites. Our theoretical analysis and simulation study demonstrate how FETA's estimation accuracy is influenced by communication budgets, privacy restrictions, and heterogeneity across populations. We apply FETA to multisite data from the electronic Medical Records and Genomics (eMERGE) Network to construct genetic risk prediction models for extreme obesity. Compared to models trained using target data only, source data only, and all data without accounting for population-level differences, FETA shows superior predictive performance. FETA has the potential to improve estimation and prediction accuracy in underrepresented populations and reduce the gap in model performance across populations.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 4","pages":"2970-2992"},"PeriodicalIF":1.3,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11417462/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ADDRESSING SELECTION BIAS AND MEASUREMENT ERROR IN COVID-19 CASE COUNT DATA USING AUXILIARY INFORMATION. 利用辅助信息解决 covid-19 病例计数数据中的选择偏差和测量误差。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2023-12-01 Epub Date: 2023-10-30 DOI: 10.1214/23-aoas1744
Walter Dempsey
{"title":"ADDRESSING SELECTION BIAS AND MEASUREMENT ERROR IN COVID-19 CASE COUNT DATA USING AUXILIARY INFORMATION.","authors":"Walter Dempsey","doi":"10.1214/23-aoas1744","DOIUrl":"https://doi.org/10.1214/23-aoas1744","url":null,"abstract":"<p><p>Coronavirus case-count data has influenced government policies and drives most epidemiological forecasts. Limited testing is cited as the key driver behind minimal information on the COVID-19 pandemic. While expanded testing is laudable, measurement error and selection bias are the two greatest problems limiting our understanding of the COVID-19 pandemic; neither can be fully addressed by increased testing capacity. In this paper, we demonstrate their impact on estimation of point prevalence and the effective reproduction number. We show that estimates based on the millions of molecular tests in the US has the same mean square error as a small simple random sample. To address this, a procedure is presented that combines case-count data and random samples over time to estimate selection propensities based on key covariate information. We then combine these selection propensities with epidemiological forecast models to construct a <i>doubly robust</i> estimation method that accounts for both measurement-error and selection bias. This method is then applied to estimate Indiana's active infection prevalence using case-count, hospitalization, and death data with demographic information, a statewide random molecular sample collected from April 25-29th, and Delphi's COVID-19 Trends and Impact Survey. We end with a series of recommendations based on the proposed methodology.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 4","pages":"2903-2923"},"PeriodicalIF":1.3,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11210953/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A DYNAMIC ADDITIVE AND MULTIPLICATIVE EFFECTS NETWORK MODEL WITH APPLICATION TO THE UNITED NATIONS VOTING BEHAVIORS. 将动态加乘效应网络模型应用于联合国投票行为。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2023-12-01 Epub Date: 2023-10-30 DOI: 10.1214/23-aoas1762
Bomin Kim, Xiaoyue Niu, David Hunter, Xun CaO
{"title":"A DYNAMIC ADDITIVE AND MULTIPLICATIVE EFFECTS NETWORK MODEL WITH APPLICATION TO THE UNITED NATIONS VOTING BEHAVIORS.","authors":"Bomin Kim, Xiaoyue Niu, David Hunter, Xun CaO","doi":"10.1214/23-aoas1762","DOIUrl":"10.1214/23-aoas1762","url":null,"abstract":"<p><p>Motivated by a study of United Nations voting behaviors, we introduce a regression model for a series of networks that are correlated over time. Our model is a dynamic extension of the additive and multiplicative effects network model (AMEN) of Hoff (2021). In addition to incorporating a temporal structure, the model accommodates two types of missing data thus allows the size of the network to vary over time. We demonstrate via simulations the necessity of various components of the model. We apply the model to the United Nations General Assembly voting data from 1983 to 2014 (Voeten, 2013) to answer interesting research questions regarding international voting behaviors. In addition to finding important factors that could explain the voting behaviors, the model-estimated additive effects, multiplicative effects, and their movements reveal meaningful foreign policy positions and alliances of various countries.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 4","pages":"3283-3299"},"PeriodicalIF":1.8,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10798233/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139514175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAYESIAN HIERARCHICAL MODELING AND ANALYSIS FOR ACTIGRAPH DATA FROM WEARABLE DEVICES. 对来自可穿戴设备的动作图数据进行贝叶斯分层建模和分析。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2023-12-01 Epub Date: 2023-10-30 DOI: 10.1214/23-aoas1742
Pierfrancesco Alaimo Di Loro, Marco Mingione, Jonah Lipsitt, Christina M Batteate, Michael Jerrett, Sudipto Banerjee
{"title":"BAYESIAN HIERARCHICAL MODELING AND ANALYSIS FOR ACTIGRAPH DATA FROM WEARABLE DEVICES.","authors":"Pierfrancesco Alaimo Di Loro, Marco Mingione, Jonah Lipsitt, Christina M Batteate, Michael Jerrett, Sudipto Banerjee","doi":"10.1214/23-aoas1742","DOIUrl":"10.1214/23-aoas1742","url":null,"abstract":"<p><p>The majority of Americans fail to achieve recommended levels of physical activity, which leads to numerous preventable health problems such as diabetes, hypertension, and heart diseases. This has generated substantial interest in monitoring human activity to gear interventions toward environmental features that may relate to higher physical activity. Wearable devices, such as wrist-worn sensors that monitor gross motor activity (actigraph units) continuously record the activity levels of a subject, producing massive amounts of high-resolution measurements. Analyzing actigraph data needs to account for spatial and temporal information on trajectories or paths traversed by subjects wearing such devices. Inferential objectives include estimating a subject's physical activity levels along a given trajectory; identifying trajectories that are more likely to produce higher levels of physical activity for a given subject; and predicting expected levels of physical activity in any proposed new trajectory for a given set of health attributes. Here, we devise a Bayesian hierarchical modeling framework for spatial-temporal actigraphy data to deliver fully model-based inference on trajectories while accounting for subject-level health attributes and spatial-temporal dependencies. We undertake a comprehensive analysis of an original dataset from the Physical Activity through Sustainable Transport Approaches in Los Angeles (PASTA-LA) study to ascertain spatial zones and trajectories exhibiting significantly higher levels of physical activity while accounting for various sources of heterogeneity.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 4","pages":"2865-2886"},"PeriodicalIF":1.8,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10815935/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139572045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Debiased lasso for stratified Cox models with application to the national kidney transplant data. 分层 Cox 模型的去偏套索,并应用于全国肾移植数据。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2023-12-01 Epub Date: 2023-10-30 DOI: 10.1214/23-aoas1775
Lu Xia, Bin Nan, Yi Li
{"title":"Debiased lasso for stratified Cox models with application to the national kidney transplant data.","authors":"Lu Xia, Bin Nan, Yi Li","doi":"10.1214/23-aoas1775","DOIUrl":"10.1214/23-aoas1775","url":null,"abstract":"<p><p>The Scientific Registry of Transplant Recipients (SRTR) system has become a rich resource for understanding the complex mechanisms of graft failure after kidney transplant, a crucial step for allocating organs effectively and implementing appropriate care. As transplant centers that treated patients might strongly confound graft failures, Cox models stratified by centers can eliminate their confounding effects. Also, since recipient age is a proven non-modifiable risk factor, a common practice is to fit models separately by recipient age groups. The moderate sample sizes, relative to the number of covariates, in some age groups may lead to biased maximum stratified partial likelihood estimates and unreliable confidence intervals even when samples still outnumber covariates. To draw reliable inference on a comprehensive list of risk factors measured from both donors and recipients in SRTR, we propose a de-biased lasso approach via quadratic programming for fitting stratified Cox models. We establish asymptotic properties and verify via simulations that our method produces consistent estimates and confidence intervals with nominal coverage probabilities. Accounting for nearly 100 confounders in SRTR, the de-biased method detects that the graft failure hazard nonlinearly increases with donor's age among all recipient age groups, and that organs from older donors more adversely impact the younger recipients. Our method also delineates the associations between graft failure and many risk factors such as recipients' primary diagnoses (e.g. polycystic disease, glomerular disease, and diabetes) and donor-recipient mismatches for human leukocyte antigen loci across recipient age groups. These results may inform the refinement of donor-recipient matching criteria for stakeholders.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 4","pages":"3550-3569"},"PeriodicalIF":1.3,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10720921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A DYNAMIC SPATIAL FILTERING APPROACH TO MITIGATE UNDERESTIMATION BIAS IN FIELD CALIBRATED LOW-COST SENSOR AIR POLLUTION DATA. 一种动态空间过滤方法,用于减轻现场校准的低成本传感器空气污染数据的低估偏差。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2023-12-01 Epub Date: 2023-10-30 DOI: 10.1214/23-aoas1751
Claire Heffernan, Roger PenG, Drew R Gentner, Kirsten Koehler, Abhirup Datta
{"title":"A DYNAMIC SPATIAL FILTERING APPROACH TO MITIGATE UNDERESTIMATION BIAS IN FIELD CALIBRATED LOW-COST SENSOR AIR POLLUTION DATA.","authors":"Claire Heffernan, Roger PenG, Drew R Gentner, Kirsten Koehler, Abhirup Datta","doi":"10.1214/23-aoas1751","DOIUrl":"10.1214/23-aoas1751","url":null,"abstract":"<p><p>Low-cost air pollution sensors, offering hyper-local characterization of pollutant concentrations, are becoming increasingly prevalent in environmental and public health research. However, low-cost air pollution data can be noisy, biased by environmental conditions, and usually need to be field-calibrated by collocating low-cost sensors with reference-grade instruments. We show, theoretically and empirically, that the common procedure of regression-based calibration using collocated data systematically underestimates high air pollution concentrations, which are critical to diagnose from a health perspective. Current calibration practices also often fail to utilize the spatial correlation in pollutant concentrations. We propose a novel spatial filtering approach to collocation-based calibration of low-cost networks that mitigates the underestimation issue by using an inverse regression. The inverse-regression also allows for incorporating spatial correlations by a second-stage model for the true pollutant concentrations using a conditional Gaussian Process. Our approach works with one or more collocated sites in the network and is dynamic, leveraging spatial correlation with the latest available reference data. Through extensive simulations, we demonstrate how the spatial filtering substantially improves estimation of pollutant concentrations, and measures peak concentrations with greater accuracy. We apply the methodology for calibration of a low-cost PM<sub>2.5</sub> network in Baltimore, Maryland, and diagnose air pollution peaks that are missed by the regression-calibration.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 4","pages":"3056-3087"},"PeriodicalIF":1.3,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11031266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140864015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAYESIAN INFERENCE AND DYNAMIC PREDICTION FOR MULTIVARIATE LONGITUDINAL AND SURVIVAL DATA. 多变量纵向和生存数据的贝叶斯推理和动态预测。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2023-09-01 Epub Date: 2023-09-07 DOI: 10.1214/23-aoas1733
Haotian Zou, Donglin Zeng, Luo Xiao, Sheng Luo
{"title":"BAYESIAN INFERENCE AND DYNAMIC PREDICTION FOR MULTIVARIATE LONGITUDINAL AND SURVIVAL DATA.","authors":"Haotian Zou, Donglin Zeng, Luo Xiao, Sheng Luo","doi":"10.1214/23-aoas1733","DOIUrl":"10.1214/23-aoas1733","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a complex neurological disorder impairing multiple domains such as cognition and daily functions. To better understand the disease and its progression, many AD research studies collect multiple longitudinal outcomes that are strongly predictive of the onset of AD dementia. We propose a joint model based on a multivariate functional mixed model framework (referred to as MFMM-JM) that simultaneously models the multiple longitudinal outcomes and the time to dementia onset. We develop six functional forms to fully investigate the complex association between longitudinal outcomes and dementia onset. Moreover, we use the Bayesian methods for statistical inference and develop a dynamic prediction framework that provides accurate personalized predictions of disease progressions based on new subject-specific data. We apply the proposed MFMM-JM to two large ongoing AD studies: the Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC), and identify the functional forms with the best predictive performance. our method is also validated by extensive simulation studies with five settings.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 3","pages":"2574-2595"},"PeriodicalIF":1.3,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500582/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10339586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
THE SCALABLE BIRTH-DEATH MCMC ALGORITHM FOR MIXED GRAPHICAL MODEL LEARNING WITH APPLICATION TO GENOMIC DATA INTEGRATION. 用于混合图形模型学习的可扩展出生路径MCMC算法及其在基因组数据集成中的应用。
IF 1.3 4区 数学
Annals of Applied Statistics Pub Date : 2023-09-01 Epub Date: 2023-10-07 DOI: 10.1214/22-aoas1701
Nanwei Wang, Hélène Massam, Xin Gao, Laurent Briollais
{"title":"THE SCALABLE BIRTH-DEATH MCMC ALGORITHM FOR MIXED GRAPHICAL MODEL LEARNING WITH APPLICATION TO GENOMIC DATA INTEGRATION.","authors":"Nanwei Wang, Hélène Massam, Xin Gao, Laurent Briollais","doi":"10.1214/22-aoas1701","DOIUrl":"10.1214/22-aoas1701","url":null,"abstract":"<p><p>Recent advances in biological research have seen the emergence of high-throughput technologies with numerous applications that allow the study of biological mechanisms at an unprecedented depth and scale. A large amount of genomic data is now distributed through consortia like The Cancer Genome Atlas (TCGA), where specific types of biological information on specific type of tissue or cell are available. In cancer research, the challenge is now to perform integrative analyses of high-dimensional multi-omic data with the goal to better understand genomic processes that correlate with cancer outcomes, e.g. elucidate gene networks that discriminate a specific cancer subgroups (cancer sub-typing) or discovering gene networks that overlap across different cancer types (pan-cancer studies). In this paper, we propose a novel mixed graphical model approach to analyze multi-omic data of different types (continuous, discrete and count) and perform model selection by extending the Birth-Death MCMC (BDMCMC) algorithm initially proposed by Stephens (2000) and later developed by Mohammadi and Wit (2015). We compare the performance of our method to the LASSO method and the standard BDMCMC method using simulations and find that our method is superior in terms of both computational efficiency and the accuracy of the model selection results. Finally, an application to the TCGA breast cancer data shows that integrating genomic information at different levels (mutation and expression data) leads to better subtyping of breast cancers.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 3","pages":"1958-1983"},"PeriodicalIF":1.3,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10569451/pdf/nihms-1886934.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41219379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PROBABILISTIC LEARNING OF TREATMENT TREES IN CANCER. 癌症治疗树的概率学习。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2023-09-01 Epub Date: 2023-09-07 DOI: 10.1214/22-aoas1696
Tsung-Hung Yao, Zhenke Wu, Karthik Bharath, Jinju Li, Veerabhadran Baladandayuthapani
{"title":"PROBABILISTIC LEARNING OF TREATMENT TREES IN CANCER.","authors":"Tsung-Hung Yao,&nbsp;Zhenke Wu,&nbsp;Karthik Bharath,&nbsp;Jinju Li,&nbsp;Veerabhadran Baladandayuthapani","doi":"10.1214/22-aoas1696","DOIUrl":"10.1214/22-aoas1696","url":null,"abstract":"<p><p>Accurate identification of synergistic treatment combinations and their underlying biological mechanisms is critical across many disease domains, especially cancer. In translational oncology research, preclinical systems such as patient-derived xenografts (PDX) have emerged as a unique study design evaluating multiple treatments administered to samples from the same human tumor implanted into genetically identical mice. In this paper, we propose a novel Bayesian probabilistic tree-based framework for PDX data to investigate the hierarchical relationships between treatments by inferring treatment cluster trees, referred to as treatment trees (R<sub>x</sub>-tree). The framework motivates a new metric of mechanistic similarity between two or more treatments accounting for inherent uncertainty in tree estimation; treatments with a high estimated similarity have potentially high mechanistic synergy. Building upon Dirichlet Diffusion Trees, we derive a closed-form marginal likelihood encoding the tree structure, which facilitates computationally efficient posterior inference via a new two-stage algorithm. Simulation studies demonstrate superior performance of the proposed method in recovering the tree structure and treatment similarities. Our analyses of a recently collated PDX dataset produce treatment similarity estimates that show a high degree of concordance with known biological mechanisms across treatments in five different cancers. More importantly, we uncover new and potentially effective combination therapies that confer synergistic regulation of specific downstream biological pathways for future clinical investigations. Our accompanying code, data, and shiny application for visualization of results are available at: https://github.com/bayesrx/RxTree.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 3","pages":"1884-1908"},"PeriodicalIF":1.8,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501503/pdf/nihms-1857187.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10308161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAYESIAN ANALYSIS FOR IMBALANCED POSITIVE-UNLABELLED DIAGNOSIS CODES IN ELECTRONIC HEALTH RECORDS. 电子病历中不平衡阳性未标记诊断码的贝叶斯分析。
IF 1.8 4区 数学
Annals of Applied Statistics Pub Date : 2023-06-01 DOI: 10.1214/22-AOAS1666
Ru Wang, Ye Liang, Zhuqi Miao, Tieming Liu
{"title":"BAYESIAN ANALYSIS FOR IMBALANCED POSITIVE-UNLABELLED DIAGNOSIS CODES IN ELECTRONIC HEALTH RECORDS.","authors":"Ru Wang,&nbsp;Ye Liang,&nbsp;Zhuqi Miao,&nbsp;Tieming Liu","doi":"10.1214/22-AOAS1666","DOIUrl":"https://doi.org/10.1214/22-AOAS1666","url":null,"abstract":"<p><p>With the increasing availability of electronic health records (EHR), significant progress has been made on developing predictive inference and algorithms by health data analysts and researchers. However, the EHR data are notoriously noisy due to missing and inaccurate inputs despite the information is abundant. One serious problem is that only a small portion of patients in the database has confirmatory diagnoses while many other patients remain undiagnosed because they did not comply with the recommended examinations. The phenomenon leads to a so-called positive-unlabelled situation and the labels are extremely imbalanced. In this paper, we propose a model-based approach to classify the unlabelled patients by using a Bayesian finite mixture model. We also discuss the label switching issue for the imbalanced data and propose a consensus Monte Carlo approach to address the imbalance issue and improve computational efficiency simultaneously. Simulation studies show that our proposed model-based approach outperforms existing positive-unlabelled learning algorithms. The proposed method is applied on the Cerner EHR for detecting diabetic retinopathy (DR) patients using laboratory measurements. With only 3% confirmatory diagnoses in the EHR database, we estimate the actual DR prevalence to be 25% which coincides with reported findings in the medical literature.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 2","pages":"1220-1238"},"PeriodicalIF":1.8,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10156089/pdf/nihms-1852796.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9563428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信