Using Electronic Health Records to Classify Cancer Site and Metastasis.

IF 2.2 2区 医学 Q4 MEDICAL INFORMATICS
Applied Clinical Informatics Pub Date : 2025-05-01 Epub Date: 2025-06-18 DOI:10.1055/a-2544-3117
Kurt Kroenke, Kathryn J Ruddy, Deirdre R Pachman, Veronica Grzegorczyk, Jeph Herrin, Parvez A Rahman, Kyle A Tobin, Joan M Griffin, Linda L Chlan, Jessica D Austin, Jennifer L Ridgeway, Sandra A Mitchell, Keith A Marsolo, Andrea L Cheville
{"title":"Using Electronic Health Records to Classify Cancer Site and Metastasis.","authors":"Kurt Kroenke, Kathryn J Ruddy, Deirdre R Pachman, Veronica Grzegorczyk, Jeph Herrin, Parvez A Rahman, Kyle A Tobin, Joan M Griffin, Linda L Chlan, Jessica D Austin, Jennifer L Ridgeway, Sandra A Mitchell, Keith A Marsolo, Andrea L Cheville","doi":"10.1055/a-2544-3117","DOIUrl":null,"url":null,"abstract":"<p><p>The Enhanced EHR-facilitated Cancer Symptom Control (E2C2) Trial is a pragmatic trial testing a collaborative care approach for managing common cancer symptoms. There were challenges in identifying cancer site and metastatic status.This study compares three different approaches to determine cancer site and six strategies for identifying the presence of metastasis using EHR and cancer registry data.The E2C2 cohort included 50,559 patients seen in the medical oncology clinics of a large health system. SPPADE symptoms were assessed with 0 to 10 numeric rating scales (NRS). A multistep process was used to develop three approaches for representing cancer site: the single most prevalent International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) code, the two most prevalent codes, and any diagnostic code. Six approaches for identifying metastatic disease were compared: ICD-10 codes, natural language processing (NLP), cancer registry, medications typically prescribed for incurable disease, treatment plan, and evaluation for phase 1 trials.The approach counting the two most prevalent ICD-10 cancer site diagnoses per patient detected a median of 92% of the cases identified by counting all cancer site diagnoses, whereas the approach counting only the single most prevalent cancer site diagnosis identified a median of 65%. However, agreement among the three approaches was very good (kappa > 0.80) for most cancer sites. ICD and NLP methods could be applied to the entire cohort and had the highest agreement (kappa = 0.53) for identifying metastasis. Cancer registry data was available for less than half of the patients.Identification of cancer site and metastatic disease using EHR data was feasible in this large and diverse cohort of patients with common cancer symptoms. The methods were pragmatic and may be acceptable for covariates, but likely require refinement for key dependent and independent variables.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":"16 3","pages":"556-568"},"PeriodicalIF":2.2000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12176508/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Clinical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2544-3117","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/18 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

The Enhanced EHR-facilitated Cancer Symptom Control (E2C2) Trial is a pragmatic trial testing a collaborative care approach for managing common cancer symptoms. There were challenges in identifying cancer site and metastatic status.This study compares three different approaches to determine cancer site and six strategies for identifying the presence of metastasis using EHR and cancer registry data.The E2C2 cohort included 50,559 patients seen in the medical oncology clinics of a large health system. SPPADE symptoms were assessed with 0 to 10 numeric rating scales (NRS). A multistep process was used to develop three approaches for representing cancer site: the single most prevalent International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) code, the two most prevalent codes, and any diagnostic code. Six approaches for identifying metastatic disease were compared: ICD-10 codes, natural language processing (NLP), cancer registry, medications typically prescribed for incurable disease, treatment plan, and evaluation for phase 1 trials.The approach counting the two most prevalent ICD-10 cancer site diagnoses per patient detected a median of 92% of the cases identified by counting all cancer site diagnoses, whereas the approach counting only the single most prevalent cancer site diagnosis identified a median of 65%. However, agreement among the three approaches was very good (kappa > 0.80) for most cancer sites. ICD and NLP methods could be applied to the entire cohort and had the highest agreement (kappa = 0.53) for identifying metastasis. Cancer registry data was available for less than half of the patients.Identification of cancer site and metastatic disease using EHR data was feasible in this large and diverse cohort of patients with common cancer symptoms. The methods were pragmatic and may be acceptable for covariates, but likely require refinement for key dependent and independent variables.

Abstract Image

Abstract Image

使用电子健康记录分类癌症部位和转移。
增强ehr促进的癌症症状控制(E2C2)试验是一项实用的试验,用于测试管理常见癌症症状的协作治疗方法。在确定癌症部位和转移状态方面存在挑战。本研究比较了三种不同的确定癌症部位的方法,以及使用电子病历和癌症登记数据确定转移存在的六种策略。E2C2队列包括在大型卫生系统的肿瘤医学诊所就诊的50,559例患者。SPPADE症状采用0 - 10数值评定量表(NRS)进行评估。采用多步骤过程开发了三种表示癌症部位的方法:最流行的单一国际疾病和相关健康问题统计分类第十次修订版(ICD-10)代码,两个最流行的代码和任何诊断代码。比较了六种识别转移性疾病的方法:ICD-10代码、自然语言处理(NLP)、癌症登记、不治之症的典型处方药物、治疗计划和一期试验的评估。计算每个患者两个最普遍的ICD-10癌症部位诊断的方法,通过计算所有癌症部位诊断,发现的病例中位数为92%,而仅计算一个最普遍的癌症部位诊断的方法,发现的中位数为65%。然而,对于大多数癌症部位,三种方法之间的一致性非常好(kappa bb0 0.80)。ICD和NLP方法适用于整个队列,在鉴别转移方面具有最高的一致性(kappa = 0.53)。只有不到一半的患者有癌症登记数据。在这一具有常见癌症症状的大量多样化患者队列中,使用电子病历数据识别癌症部位和转移性疾病是可行的。这些方法是实用的,对于协变量可能是可以接受的,但可能需要对关键的因变量和自变量进行细化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Clinical Informatics
Applied Clinical Informatics MEDICAL INFORMATICS-
CiteScore
4.60
自引率
24.10%
发文量
132
期刊介绍: ACI is the third Schattauer journal dealing with biomedical and health informatics. It perfectly complements our other journals Öffnet internen Link im aktuellen FensterMethods of Information in Medicine and the Öffnet internen Link im aktuellen FensterYearbook of Medical Informatics. The Yearbook of Medical Informatics being the “Milestone” or state-of-the-art journal and Methods of Information in Medicine being the “Science and Research” journal of IMIA, ACI intends to be the “Practical” journal of IMIA.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信