使用纵向电子健康记录的事件时间注释的半监督方法。

IF 1.2 3区数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Lifetime Data Analysis Pub Date : 2022-07-01 DOI:10.1007/s10985-022-09557-5

Liang Liang, Jue Hou, Hajime Uno, Kelly Cho, Yanyuan Ma, Tianxi Cai

{"title":"使用纵向电子健康记录的事件时间注释的半监督方法。","authors":"Liang Liang, Jue Hou, Hajime Uno, Kelly Cho, Yanyuan Ma, Tianxi Cai","doi":"10.1007/s10985-022-09557-5","DOIUrl":null,"url":null,"abstract":"Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on important clinical outcomes such as time to cancer progression are not readily available in these databases. The true clinical event times typically cannot be approximated well based on simple extracts of billing or procedure codes. Whereas, annotating event times manually is time and resource prohibitive. In this paper, we propose a two-step semi-supervised multi-modal automated time annotation (MATA) method leveraging multi-dimensional longitudinal EHR encounter records. In step I, we employ a functional principal component analysis approach to estimate the underlying intensity functions based on observed point processes from the unlabeled patients. In step II, we fit a penalized proportional odds model to the event time outcomes with features derived in step I in the labeled data where the non-parametric baseline function is approximated using B-splines. Under regularity conditions, the resulting estimator of the feature effect vector is shown as root-n consistent. We demonstrate the superiority of our approach relative to existing approaches through simulations and a real data example on annotating lung cancer recurrence in an EHR cohort of lung cancer patients from Veteran Health Administration.","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":"28 3","pages":"428-491"},"PeriodicalIF":1.2000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10044535/pdf/nihms-1879201.pdf","citationCount":"4","resultStr":"{\"title\":\"Semi-supervised approach to event time annotation using longitudinal electronic health records.\",\"authors\":\"Liang Liang, Jue Hou, Hajime Uno, Kelly Cho, Yanyuan Ma, Tianxi Cai\",\"doi\":\"10.1007/s10985-022-09557-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on important clinical outcomes such as time to cancer progression are not readily available in these databases. The true clinical event times typically cannot be approximated well based on simple extracts of billing or procedure codes. Whereas, annotating event times manually is time and resource prohibitive. In this paper, we propose a two-step semi-supervised multi-modal automated time annotation (MATA) method leveraging multi-dimensional longitudinal EHR encounter records. In step I, we employ a functional principal component analysis approach to estimate the underlying intensity functions based on observed point processes from the unlabeled patients. In step II, we fit a penalized proportional odds model to the event time outcomes with features derived in step I in the labeled data where the non-parametric baseline function is approximated using B-splines. Under regularity conditions, the resulting estimator of the feature effect vector is shown as root-n consistent. We demonstrate the superiority of our approach relative to existing approaches through simulations and a real data example on annotating lung cancer recurrence in an EHR cohort of lung cancer patients from Veteran Health Administration.\",\"PeriodicalId\":49908,\"journal\":{\"name\":\"Lifetime Data Analysis\",\"volume\":\"28 3\",\"pages\":\"428-491\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10044535/pdf/nihms-1879201.pdf\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lifetime Data Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s10985-022-09557-5\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lifetime Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10985-022-09557-5","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 4

摘要

来自保险索赔和电子健康记录(EHR)系统的大型临床数据集是精准医学研究的宝贵资源。这些数据集可用于开发个性化预测风险或治疗反应的模型。然而，利用真实世界的数据有效地推导预测模型面临着实践和方法上的挑战。关于重要临床结果的精确信息，如癌症进展的时间，在这些数据库中并不容易获得。真实的临床事件时间通常不能根据简单的账单或程序代码的摘录很好地近似。然而，手动标注事件时间既费时又浪费资源。在本文中，我们提出了一种利用多维纵向电子病历记录的两步半监督多模态自动时间注释(MATA)方法。在第一步中，我们采用功能主成分分析方法来估计基于未标记患者的观察点过程的潜在强度函数。在步骤II中，我们将一个惩罚比例赔率模型拟合到事件时间结果中，该模型使用步骤I在标记数据中导出的特征，其中使用b样条近似非参数基线函数。在正则性条件下，得到的特征效应向量估计量为根n一致。我们通过模拟和退伍军人健康管理局肺癌患者EHR队列中肺癌复发注释的真实数据示例，证明了我们的方法相对于现有方法的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semi-supervised approach to event time annotation using longitudinal electronic health records.

Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on important clinical outcomes such as time to cancer progression are not readily available in these databases. The true clinical event times typically cannot be approximated well based on simple extracts of billing or procedure codes. Whereas, annotating event times manually is time and resource prohibitive. In this paper, we propose a two-step semi-supervised multi-modal automated time annotation (MATA) method leveraging multi-dimensional longitudinal EHR encounter records. In step I, we employ a functional principal component analysis approach to estimate the underlying intensity functions based on observed point processes from the unlabeled patients. In step II, we fit a penalized proportional odds model to the event time outcomes with features derived in step I in the labeled data where the non-parametric baseline function is approximated using B-splines. Under regularity conditions, the resulting estimator of the feature effect vector is shown as root-n consistent. We demonstrate the superiority of our approach relative to existing approaches through simulations and a real data example on annotating lung cancer recurrence in an EHR cohort of lung cancer patients from Veteran Health Administration.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Lifetime Data Analysis 数学-数学跨学科应用

CiteScore

2.30

自引率

7.70%

发文量

审稿时长

3 months

期刊介绍： The objective of Lifetime Data Analysis is to advance and promote statistical science in the various applied fields that deal with lifetime data, including: Actuarial Science – Economics – Engineering Sciences – Environmental Sciences – Management Science – Medicine – Operations Research – Public Health – Social and Behavioral Sciences.