A General Purpose Phenotype Algorithm for Venous Thromboembolism Using Billing Codes and Natural Language Processing

E. M. Hinz, L. Bastarache, J. Denny
{"title":"A General Purpose Phenotype Algorithm for Venous Thromboembolism Using Billing Codes and Natural Language Processing","authors":"E. M. Hinz, L. Bastarache, J. Denny","doi":"10.1109/HISB.2012.74","DOIUrl":null,"url":null,"abstract":"Deep venous thrombosis and pulmonary embolism are diseases associated with significant morbidity and mortality. Well described risk factors for venous thromboembolic disease (VTE) include immobility, trauma and genetic hypercoagulabilty states, still many cases have no known associated antecedent risks. Studies to potentially define the missing risk factors preferably identify all cases of VTE. Defining VTE in the electronic health record is more challenging due to the variable duration of VTE treatment, crossover of therapeutic modalities to other chronic diseases and prevention treatment related to hospitalizations. We designed a general purpose Natural Language (NLP) algorithm to capture acute and historical cases of thromboembolic disease retrospectively in a de-identified electronic health record. Applying the NLP algorithm to a separate evaluation set found a positive predictive value of 84.7% and sensitivity of 95.3% for an F-measure of 0.897, which was similar to the training set of 0.925. Use of the same algorithm on problem lists in patients without VTE ICD-9s resulted in a PPV of 83%. NLP of VTE ICD-9 positive cases and non-ICD-9 positive problem lists provides an effective means for capture of both acute and historical cases of venous thromboembolic disease.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HISB.2012.74","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Deep venous thrombosis and pulmonary embolism are diseases associated with significant morbidity and mortality. Well described risk factors for venous thromboembolic disease (VTE) include immobility, trauma and genetic hypercoagulabilty states, still many cases have no known associated antecedent risks. Studies to potentially define the missing risk factors preferably identify all cases of VTE. Defining VTE in the electronic health record is more challenging due to the variable duration of VTE treatment, crossover of therapeutic modalities to other chronic diseases and prevention treatment related to hospitalizations. We designed a general purpose Natural Language (NLP) algorithm to capture acute and historical cases of thromboembolic disease retrospectively in a de-identified electronic health record. Applying the NLP algorithm to a separate evaluation set found a positive predictive value of 84.7% and sensitivity of 95.3% for an F-measure of 0.897, which was similar to the training set of 0.925. Use of the same algorithm on problem lists in patients without VTE ICD-9s resulted in a PPV of 83%. NLP of VTE ICD-9 positive cases and non-ICD-9 positive problem lists provides an effective means for capture of both acute and historical cases of venous thromboembolic disease.
使用计费代码和自然语言处理的静脉血栓栓塞的通用表型算法
深静脉血栓形成和肺栓塞是具有显著发病率和死亡率的疾病。众所周知,静脉血栓栓塞性疾病(VTE)的危险因素包括不活动、创伤和遗传性高凝状态,但许多病例没有已知的相关先前风险。潜在地确定缺失的危险因素的研究最好能确定所有静脉血栓栓塞病例。由于静脉血栓栓塞治疗的时间长短不一、治疗方式与其他慢性疾病的交叉以及与住院相关的预防治疗,在电子健康记录中定义静脉血栓栓塞更具挑战性。我们设计了一种通用的自然语言(NLP)算法,以在去识别的电子健康记录中回顾性地捕获急性和历史的血栓栓塞性疾病病例。将NLP算法应用于单独的评估集,f值为0.897,阳性预测值为84.7%,灵敏度为95.3%,与训练集0.925相似。在没有VTE的icd -9患者的问题列表中使用相同的算法导致PPV为83%。VTE ICD-9阳性病例和非ICD-9阳性问题清单的NLP为捕获急性和历史静脉血栓栓塞性疾病病例提供了有效手段。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信