Thomas Z Li, Kaiwen Xu, Neil C Chada, Heidi Chen, Michael Knight, Sanja Antic, Kim L Sandler, Fabien Maldonado, Bennett A Landman, Thomas A Lasko
{"title":"为有肺癌风险的社区队列收集回顾性多模态和纵向数据。","authors":"Thomas Z Li, Kaiwen Xu, Neil C Chada, Heidi Chen, Michael Knight, Sanja Antic, Kim L Sandler, Fabien Maldonado, Bennett A Landman, Thomas A Lasko","doi":"10.3233/CBM-230340","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large community cohorts are useful for lung cancer research, allowing for the analysis of risk factors and development of predictive models.</p><p><strong>Objective: </strong>A robust methodology for (1) identifying lung cancer and pulmonary nodules diagnoses as well as (2) associating multimodal longitudinal data with these events from electronic health record (EHRs) is needed to optimally curate cohorts at scale.</p><p><strong>Methods: </strong>In this study, we leveraged (1) SNOMED concepts to develop ICD-based decision rules for building a cohort that captured lung cancer and pulmonary nodules and (2) clinical knowledge to define time windows for collecting longitudinal imaging and clinical concepts. We curated three cohorts with clinical data and repeated imaging for subjects with pulmonary nodules from our Vanderbilt University Medical Center.</p><p><strong>Results: </strong>Our approach achieved an estimated sensitivity 0.930 (95% CI: [0.879, 0.969]), specificity of 0.996 (95% CI: [0.989, 1.00]), positive predictive value of 0.979 (95% CI: [0.959, 1.000]), and negative predictive value of 0.987 (95% CI: [0.976, 0.994]) for distinguishing lung cancer from subjects with SPNs.</p><p><strong>Conclusion: </strong>This work represents a general strategy for high-throughput curation of multi-modal longitudinal cohorts at risk for lung cancer from routinely collected EHRs.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11380038/pdf/","citationCount":"0","resultStr":"{\"title\":\"Curating retrospective multimodal and longitudinal data for community cohorts at risk for lung cancer.\",\"authors\":\"Thomas Z Li, Kaiwen Xu, Neil C Chada, Heidi Chen, Michael Knight, Sanja Antic, Kim L Sandler, Fabien Maldonado, Bennett A Landman, Thomas A Lasko\",\"doi\":\"10.3233/CBM-230340\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Large community cohorts are useful for lung cancer research, allowing for the analysis of risk factors and development of predictive models.</p><p><strong>Objective: </strong>A robust methodology for (1) identifying lung cancer and pulmonary nodules diagnoses as well as (2) associating multimodal longitudinal data with these events from electronic health record (EHRs) is needed to optimally curate cohorts at scale.</p><p><strong>Methods: </strong>In this study, we leveraged (1) SNOMED concepts to develop ICD-based decision rules for building a cohort that captured lung cancer and pulmonary nodules and (2) clinical knowledge to define time windows for collecting longitudinal imaging and clinical concepts. We curated three cohorts with clinical data and repeated imaging for subjects with pulmonary nodules from our Vanderbilt University Medical Center.</p><p><strong>Results: </strong>Our approach achieved an estimated sensitivity 0.930 (95% CI: [0.879, 0.969]), specificity of 0.996 (95% CI: [0.989, 1.00]), positive predictive value of 0.979 (95% CI: [0.959, 1.000]), and negative predictive value of 0.987 (95% CI: [0.976, 0.994]) for distinguishing lung cancer from subjects with SPNs.</p><p><strong>Conclusion: </strong>This work represents a general strategy for high-throughput curation of multi-modal longitudinal cohorts at risk for lung cancer from routinely collected EHRs.</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11380038/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3233/CBM-230340\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3233/CBM-230340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
Curating retrospective multimodal and longitudinal data for community cohorts at risk for lung cancer.
Background: Large community cohorts are useful for lung cancer research, allowing for the analysis of risk factors and development of predictive models.
Objective: A robust methodology for (1) identifying lung cancer and pulmonary nodules diagnoses as well as (2) associating multimodal longitudinal data with these events from electronic health record (EHRs) is needed to optimally curate cohorts at scale.
Methods: In this study, we leveraged (1) SNOMED concepts to develop ICD-based decision rules for building a cohort that captured lung cancer and pulmonary nodules and (2) clinical knowledge to define time windows for collecting longitudinal imaging and clinical concepts. We curated three cohorts with clinical data and repeated imaging for subjects with pulmonary nodules from our Vanderbilt University Medical Center.
Results: Our approach achieved an estimated sensitivity 0.930 (95% CI: [0.879, 0.969]), specificity of 0.996 (95% CI: [0.989, 1.00]), positive predictive value of 0.979 (95% CI: [0.959, 1.000]), and negative predictive value of 0.987 (95% CI: [0.976, 0.994]) for distinguishing lung cancer from subjects with SPNs.
Conclusion: This work represents a general strategy for high-throughput curation of multi-modal longitudinal cohorts at risk for lung cancer from routinely collected EHRs.