评估在真实世界数据中定义可观察时间对结果发生率的影响。

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association Pub Date : 2025-09-01 DOI:10.1093/jamia/ocaf119

Clair Blacketer, Frank J DeFalco, Mitchell M Conover, Patrick B Ryan, Martijn J Schuemie, Peter R Rijnbeek

{"title":"评估在真实世界数据中定义可观察时间对结果发生率的影响。","authors":"Clair Blacketer, Frank J DeFalco, Mitchell M Conover, Patrick B Ryan, Martijn J Schuemie, Peter R Rijnbeek","doi":"10.1093/jamia/ocaf119","DOIUrl":null,"url":null,"abstract":"Objective: In real-world data (RWD), defining the observation period-the time during which a patient is considered observable-is critical for estimating incidence rates (IRs) and other outcomes. Yet, in the absence of explicit enrollment information, this period must often be inferred, introducing potential bias.Materials and methods: This study evaluates methods for defining observation periods and their impact on IR estimates across multiple database types. We applied 3 methods for defining observation periods: (1) a persistence + surveillance window approach, (2) an age- and gender-adjusted method based on time between healthcare events, and (3) the min/max method. These were tested across 11 RWD databases, including both enrollment-based and encounter-based sources. Enrollment time was used as the reference standard in eligible databases. To assess the impact on epidemiologic results, we replicated a prior study of adverse event incidence, comparing IRs and calculating mean squared error between methods.Results: Incidence rates decreased as observation periods lengthened, driven by increases in the person-time denominator. The persistence + surveillance method produced estimates closest to enrollment-based rates when appropriately balanced. The min/max approach yielded inconsistent results, particularly in encounter-based databases, with greater error observed in databases with longer time spans.Discussion: These findings suggest that assumptions about data completeness and population observability significantly affect incidence estimates. Observation period definitions substantially influence outcome measurement in RWD studies.Conclusion: Standardized, transparent approaches are necessary to ensure valid, reproducible results-especially in databases lacking defined enrollment.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1434-1444"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361855/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluation of the impact of defining observable time in real-world data on outcome incidence.\",\"authors\":\"Clair Blacketer, Frank J DeFalco, Mitchell M Conover, Patrick B Ryan, Martijn J Schuemie, Peter R Rijnbeek\",\"doi\":\"10.1093/jamia/ocaf119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: In real-world data (RWD), defining the observation period-the time during which a patient is considered observable-is critical for estimating incidence rates (IRs) and other outcomes. Yet, in the absence of explicit enrollment information, this period must often be inferred, introducing potential bias.Materials and methods: This study evaluates methods for defining observation periods and their impact on IR estimates across multiple database types. We applied 3 methods for defining observation periods: (1) a persistence + surveillance window approach, (2) an age- and gender-adjusted method based on time between healthcare events, and (3) the min/max method. These were tested across 11 RWD databases, including both enrollment-based and encounter-based sources. Enrollment time was used as the reference standard in eligible databases. To assess the impact on epidemiologic results, we replicated a prior study of adverse event incidence, comparing IRs and calculating mean squared error between methods.Results: Incidence rates decreased as observation periods lengthened, driven by increases in the person-time denominator. The persistence + surveillance method produced estimates closest to enrollment-based rates when appropriately balanced. The min/max approach yielded inconsistent results, particularly in encounter-based databases, with greater error observed in databases with longer time spans.Discussion: These findings suggest that assumptions about data completeness and population observability significantly affect incidence estimates. Observation period definitions substantially influence outcome measurement in RWD studies.Conclusion: Standardized, transparent approaches are necessary to ensure valid, reproducible results-especially in databases lacking defined enrollment.\",\"PeriodicalId\":50016,\"journal\":{\"name\":\"Journal of the American Medical Informatics Association\",\"volume\":\" \",\"pages\":\"1434-1444\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12361855/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Medical Informatics Association\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1093/jamia/ocaf119\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf119","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

目的：在真实世界数据（RWD）中，定义观察期（患者被认为可观察的时间）对于估计发病率（IRs）和其他结果至关重要。然而，在缺乏明确的登记信息的情况下，这段时间往往必须推断，从而引入潜在的偏见。材料和方法：本研究评估了定义观察期的方法及其对跨多种数据库类型的IR估计的影响。我们采用了3种方法来定义观察期：(1)持续性+监测窗口法，(2)基于医疗事件间隔时间的年龄和性别调整方法，以及(3)最小/最大方法。这些数据在11个RWD数据库中进行了测试，包括基于登记和基于遭遇的来源。在符合条件的数据库中以入组时间为参考标准。为了评估对流行病学结果的影响，我们重复了先前的不良事件发生率研究，比较了ir并计算了方法之间的均方误差。结果：由于人时间分母的增加，发病率随着观察期的延长而下降。在适当平衡的情况下，持续性+监测方法产生的估计值最接近基于入学率的估计值。最小/最大方法产生了不一致的结果，特别是在基于相遇的数据库中，在时间跨度较长的数据库中观察到更大的误差。讨论：这些发现表明，关于数据完整性和群体可观察性的假设显著影响发生率估计。在RWD研究中，观察期的定义实质上影响了结果的测量。结论：标准化、透明的方法对于确保结果的有效性和可重复性是必要的，特别是在缺乏明确入组的数据库中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of the impact of defining observable time in real-world data on outcome incidence.

Objective: In real-world data (RWD), defining the observation period-the time during which a patient is considered observable-is critical for estimating incidence rates (IRs) and other outcomes. Yet, in the absence of explicit enrollment information, this period must often be inferred, introducing potential bias.

Materials and methods: This study evaluates methods for defining observation periods and their impact on IR estimates across multiple database types. We applied 3 methods for defining observation periods: (1) a persistence + surveillance window approach, (2) an age- and gender-adjusted method based on time between healthcare events, and (3) the min/max method. These were tested across 11 RWD databases, including both enrollment-based and encounter-based sources. Enrollment time was used as the reference standard in eligible databases. To assess the impact on epidemiologic results, we replicated a prior study of adverse event incidence, comparing IRs and calculating mean squared error between methods.

Results: Incidence rates decreased as observation periods lengthened, driven by increases in the person-time denominator. The persistence + surveillance method produced estimates closest to enrollment-based rates when appropriately balanced. The min/max approach yielded inconsistent results, particularly in encounter-based databases, with greater error observed in databases with longer time spans.

Discussion: These findings suggest that assumptions about data completeness and population observability significantly affect incidence estimates. Observation period definitions substantially influence outcome measurement in RWD studies.

Conclusion: Standardized, transparent approaches are necessary to ensure valid, reproducible results-especially in databases lacking defined enrollment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.