利用电子健康记录确定肺癌筛查合格性的可计算表型

IF 3.3 Q2 ONCOLOGY
JCO Clinical Cancer Informatics Pub Date : 2025-01-01 Epub Date: 2025-01-16 DOI:10.1200/CCI.24.00139
Shuang Yang, Yu Huang, Xiwei Lou, Tianchen Lyu, Ruoqi Wei, Hiren J Mehta, Yonghui Wu, Michelle Alvarado, Ramzi G Salloum, Dejana Braithwaite, Jinhai Huo, Ya-Chen Tina Shih, Yi Guo, Jiang Bian
{"title":"利用电子健康记录确定肺癌筛查合格性的可计算表型","authors":"Shuang Yang, Yu Huang, Xiwei Lou, Tianchen Lyu, Ruoqi Wei, Hiren J Mehta, Yonghui Wu, Michelle Alvarado, Ramzi G Salloum, Dejana Braithwaite, Jinhai Huo, Ya-Chen Tina Shih, Yi Guo, Jiang Bian","doi":"10.1200/CCI.24.00139","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Lung cancer screening (LCS) has the potential to reduce mortality and detect lung cancer at its early stages, but the high false-positive rate associated with low-dose computed tomography (LDCT) for LCS acts as a barrier to its widespread adoption. This study aims to develop computable phenotype (CP) algorithms on the basis of electronic health records (EHRs) to identify individual's eligibility for LCS, thereby enhancing LCS utilization in real-world settings.</p><p><strong>Materials and methods: </strong>The study cohort included 5,778 individuals who underwent LDCT for LCS from 2012 to 2022, as recorded in the University of Florida Health Integrated Data Repository. CP rules derived from LCS guidelines were used to identify potential candidates, incorporating both structured EHR and clinical notes analyzed via natural language processing. We then conducted manual reviews of 453 randomly selected charts to refine and validate these rules, assessing CP performance using metrics, for example, F1 score, specificity, and sensitivity.</p><p><strong>Results: </strong>We developed an optimal CP rule that integrates both structured and unstructured data, adhering to the US Preventive Services Task Force 2013 and 2020 guidelines. This rule focuses on age (55-80 years for 2013 and 50-80 years for 2020), smoking status (current, former, and others), and pack-years (≥30 for 2013 and ≥20 for 2020), achieving F1 scores of 0.75 and 0.84 for the respective guidelines. Including unstructured data improved the F1 score performance by up to 9.2% for 2013 and 12.9% for 2020, compared with using structured data alone.</p><p><strong>Conclusion: </strong>Our findings underscore the critical need for improved documentation of smoking information in EHRs, demonstrate the value of artificial intelligence techniques in enhancing CP performance, and confirm the effectiveness of EHR-based CP in identifying LCS-eligible individuals. This supports its potential to aid clinical decision making and optimize patient care.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400139"},"PeriodicalIF":3.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748906/pdf/","citationCount":"0","resultStr":"{\"title\":\"Toward a Computable Phenotype for Determining Eligibility of Lung Cancer Screening Using Electronic Health Records.\",\"authors\":\"Shuang Yang, Yu Huang, Xiwei Lou, Tianchen Lyu, Ruoqi Wei, Hiren J Mehta, Yonghui Wu, Michelle Alvarado, Ramzi G Salloum, Dejana Braithwaite, Jinhai Huo, Ya-Chen Tina Shih, Yi Guo, Jiang Bian\",\"doi\":\"10.1200/CCI.24.00139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Lung cancer screening (LCS) has the potential to reduce mortality and detect lung cancer at its early stages, but the high false-positive rate associated with low-dose computed tomography (LDCT) for LCS acts as a barrier to its widespread adoption. This study aims to develop computable phenotype (CP) algorithms on the basis of electronic health records (EHRs) to identify individual's eligibility for LCS, thereby enhancing LCS utilization in real-world settings.</p><p><strong>Materials and methods: </strong>The study cohort included 5,778 individuals who underwent LDCT for LCS from 2012 to 2022, as recorded in the University of Florida Health Integrated Data Repository. CP rules derived from LCS guidelines were used to identify potential candidates, incorporating both structured EHR and clinical notes analyzed via natural language processing. We then conducted manual reviews of 453 randomly selected charts to refine and validate these rules, assessing CP performance using metrics, for example, F1 score, specificity, and sensitivity.</p><p><strong>Results: </strong>We developed an optimal CP rule that integrates both structured and unstructured data, adhering to the US Preventive Services Task Force 2013 and 2020 guidelines. This rule focuses on age (55-80 years for 2013 and 50-80 years for 2020), smoking status (current, former, and others), and pack-years (≥30 for 2013 and ≥20 for 2020), achieving F1 scores of 0.75 and 0.84 for the respective guidelines. Including unstructured data improved the F1 score performance by up to 9.2% for 2013 and 12.9% for 2020, compared with using structured data alone.</p><p><strong>Conclusion: </strong>Our findings underscore the critical need for improved documentation of smoking information in EHRs, demonstrate the value of artificial intelligence techniques in enhancing CP performance, and confirm the effectiveness of EHR-based CP in identifying LCS-eligible individuals. This supports its potential to aid clinical decision making and optimize patient care.</p>\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":\"9 \",\"pages\":\"e2400139\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748906/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI.24.00139\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.24.00139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/16 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:肺癌筛查(LCS)具有降低死亡率和早期发现肺癌的潜力,但低剂量计算机断层扫描(LDCT)对LCS的高假阳性率是其广泛采用的障碍。本研究旨在开发基于电子健康记录(EHRs)的可计算表型(CP)算法,以确定个人是否有资格使用LCS,从而提高LCS在现实环境中的应用。材料和方法:研究队列包括5,778名在2012年至2022年期间接受LDCT治疗LCS的个体,记录在佛罗里达大学健康综合数据库中。来自LCS指南的CP规则用于识别潜在的候选人,结合结构化的电子病历和通过自然语言处理分析的临床记录。然后,我们对453个随机选择的图表进行了人工审查,以完善和验证这些规则,使用指标评估CP的表现,例如F1评分、特异性和敏感性。结果:我们开发了一个优化的CP规则,集成了结构化和非结构化数据,符合美国预防服务工作组2013年和2020年的指南。该规则侧重于年龄(2013年55-80岁,2020年50-80岁),吸烟状况(现在,以前和其他)和包龄(2013年≥30岁,2020年≥20岁),分别达到0.75和0.84的F1分数。与单独使用结构化数据相比,包含非结构化数据的F1成绩在2013年和2020年分别提高了9.2%和12.9%。结论:我们的研究结果强调了改善电子病历中吸烟信息记录的迫切需要,证明了人工智能技术在提高CP绩效方面的价值,并证实了基于电子病历的CP在识别lcs合格个体方面的有效性。这支持了它帮助临床决策和优化患者护理的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Toward a Computable Phenotype for Determining Eligibility of Lung Cancer Screening Using Electronic Health Records.

Purpose: Lung cancer screening (LCS) has the potential to reduce mortality and detect lung cancer at its early stages, but the high false-positive rate associated with low-dose computed tomography (LDCT) for LCS acts as a barrier to its widespread adoption. This study aims to develop computable phenotype (CP) algorithms on the basis of electronic health records (EHRs) to identify individual's eligibility for LCS, thereby enhancing LCS utilization in real-world settings.

Materials and methods: The study cohort included 5,778 individuals who underwent LDCT for LCS from 2012 to 2022, as recorded in the University of Florida Health Integrated Data Repository. CP rules derived from LCS guidelines were used to identify potential candidates, incorporating both structured EHR and clinical notes analyzed via natural language processing. We then conducted manual reviews of 453 randomly selected charts to refine and validate these rules, assessing CP performance using metrics, for example, F1 score, specificity, and sensitivity.

Results: We developed an optimal CP rule that integrates both structured and unstructured data, adhering to the US Preventive Services Task Force 2013 and 2020 guidelines. This rule focuses on age (55-80 years for 2013 and 50-80 years for 2020), smoking status (current, former, and others), and pack-years (≥30 for 2013 and ≥20 for 2020), achieving F1 scores of 0.75 and 0.84 for the respective guidelines. Including unstructured data improved the F1 score performance by up to 9.2% for 2013 and 12.9% for 2020, compared with using structured data alone.

Conclusion: Our findings underscore the critical need for improved documentation of smoking information in EHRs, demonstrate the value of artificial intelligence techniques in enhancing CP performance, and confirm the effectiveness of EHR-based CP in identifying LCS-eligible individuals. This supports its potential to aid clinical decision making and optimize patient care.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.20
自引率
4.80%
发文量
190
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信