BAYESIAN DATA AUGMENTATION FOR RECURRENT EVENTS UNDER INTERMITTENT ASSESSMENT IN OVERLAPPING INTERVALS WITH APPLICATIONS TO EMR DATA.

IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY
Annals of Applied Statistics Pub Date : 2025-06-01 Epub Date: 2025-05-28 DOI:10.1214/24-aoas2007
Xin Liu, Patrick M Schnell
{"title":"BAYESIAN DATA AUGMENTATION FOR RECURRENT EVENTS UNDER INTERMITTENT ASSESSMENT IN OVERLAPPING INTERVALS WITH APPLICATIONS TO EMR DATA.","authors":"Xin Liu, Patrick M Schnell","doi":"10.1214/24-aoas2007","DOIUrl":null,"url":null,"abstract":"<p><p>Electronic medical records (EMR) data contain rich information that can facilitate health-related studies but is collected primarily for purposes other than research. For recurrent events, EMR data often do not record event times or counts but only contain intermittently assessed and censored observations (i.e. upper and/or lower bounds for counts in a time interval) at uncontrolled times. This can result in non-contiguous or overlapping assessment intervals with censored event counts. Existing methods for analyzing intermittently assessed recurrent events assume disjoint assessment intervals with known counts (interval count data) due to a focus on prospective studies with controlled assessment times. We propose a Bayesian data augmentation method to analyze the complicated assessments in EMR data for recurrent events. Within a Gibbs sampler, event times are imputed by generating sets of event times from non-homogeneous Poisson processes and rejecting proposed sets that are incompatible with constraints imposed by assessment data. Based on the independent increments property of Poisson processes, we implement three techniques to speed up this rejection sampling imputation method for large EMR datasets: independent sampling by partitioning, truncated generation, and sequential sampling. In a simulation study we show our method accurately estimates parameters of log-linear Poisson process intensities. Although the proposed method can be applied generally to EMR data of recurrent events, our study is specifically motivated by identifying risk factors for falls due to cancer treatment and its supportive medications. We used the proposed method to analyze an EMR dataset comprising 5501 patients treated for breast cancer. Our analysis provides evidence supporting associations between certain risk factors (including classes of medications) and risk of falls.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"19 2","pages":"1332-1361"},"PeriodicalIF":1.4000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393837/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/24-aoas2007","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/28 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Electronic medical records (EMR) data contain rich information that can facilitate health-related studies but is collected primarily for purposes other than research. For recurrent events, EMR data often do not record event times or counts but only contain intermittently assessed and censored observations (i.e. upper and/or lower bounds for counts in a time interval) at uncontrolled times. This can result in non-contiguous or overlapping assessment intervals with censored event counts. Existing methods for analyzing intermittently assessed recurrent events assume disjoint assessment intervals with known counts (interval count data) due to a focus on prospective studies with controlled assessment times. We propose a Bayesian data augmentation method to analyze the complicated assessments in EMR data for recurrent events. Within a Gibbs sampler, event times are imputed by generating sets of event times from non-homogeneous Poisson processes and rejecting proposed sets that are incompatible with constraints imposed by assessment data. Based on the independent increments property of Poisson processes, we implement three techniques to speed up this rejection sampling imputation method for large EMR datasets: independent sampling by partitioning, truncated generation, and sequential sampling. In a simulation study we show our method accurately estimates parameters of log-linear Poisson process intensities. Although the proposed method can be applied generally to EMR data of recurrent events, our study is specifically motivated by identifying risk factors for falls due to cancer treatment and its supportive medications. We used the proposed method to analyze an EMR dataset comprising 5501 patients treated for breast cancer. Our analysis provides evidence supporting associations between certain risk factors (including classes of medications) and risk of falls.

重复事件在重叠区间间歇评估下的贝叶斯数据增强与emr数据的应用。
电子医疗记录(EMR)数据包含丰富的信息,可以促进与健康有关的研究,但主要用于研究以外的目的。对于复发性事件,EMR数据通常不记录事件时间或计数,而只包含在不受控制的时间内间歇性评估和审查的观察结果(即时间间隔内计数的上限和/或下限)。这可能导致不连续或重叠的评估间隔与审查的事件计数。现有的分析间歇性评估的复发事件的方法,由于侧重于评估时间可控的前瞻性研究,假设具有已知计数(间隔计数数据)的不相交评估间隔。我们提出了一种贝叶斯数据增强方法来分析EMR数据中对复发事件的复杂评估。在吉布斯采样器中,通过从非齐次泊松过程中生成事件时间集并拒绝与评估数据施加的约束不兼容的建议集来估算事件时间。基于泊松过程的独立增量特性,我们实现了三种技术来加速这种大型EMR数据集的拒绝采样插入方法:分区独立采样、截断生成和顺序采样。仿真研究表明,该方法能准确地估计对数线性泊松过程强度的参数。虽然所提出的方法可以普遍应用于复发事件的EMR数据,但我们的研究是为了确定癌症治疗及其支持药物导致跌倒的危险因素。我们使用提出的方法分析了包含5501名乳腺癌治疗患者的EMR数据集。我们的分析提供了支持某些风险因素(包括药物类别)与跌倒风险之间关联的证据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Applied Statistics
Annals of Applied Statistics 社会科学-统计学与概率论
CiteScore
3.10
自引率
5.60%
发文量
131
审稿时长
6-12 weeks
期刊介绍: Statistical research spans an enormous range from direct subject-matter collaborations to pure mathematical theory. The Annals of Applied Statistics, the newest journal from the IMS, is aimed at papers in the applied half of this range. Published quarterly in both print and electronic form, our goal is to provide a timely and unified forum for all areas of applied statistics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信