在医院外进行SARS-CoV2检测的患者的可计算病例定义。

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES

JAMIA Open Pub Date : 2023-07-05 eCollection Date: 2023-10-01 DOI:10.1093/jamiaopen/ooad047

Lijing Wang, Amy R Zipursky, Alon Geva, Andrew J McMurry, Kenneth D Mandl, Timothy A Miller

{"title":"在医院外进行SARS-CoV2检测的患者的可计算病例定义。","authors":"Lijing Wang, Amy R Zipursky, Alon Geva, Andrew J McMurry, Kenneth D Mandl, Timothy A Miller","doi":"10.1093/jamiaopen/ooad047","DOIUrl":null,"url":null,"abstract":"Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).Materials and methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier.Results: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19.Discussion: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned.Conclusion: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts.","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 3","pages":"ooad047"},"PeriodicalIF":2.5000,"publicationDate":"2023-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10322650/pdf/","citationCount":"0","resultStr":"{\"title\":\"A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital.\",\"authors\":\"Lijing Wang, Amy R Zipursky, Alon Geva, Andrew J McMurry, Kenneth D Mandl, Timothy A Miller\",\"doi\":\"10.1093/jamiaopen/ooad047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).Materials and methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier.Results: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19.Discussion: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned.Conclusion: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts.\",\"PeriodicalId\":36278,\"journal\":{\"name\":\"JAMIA Open\",\"volume\":\"6 3\",\"pages\":\"ooad047\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10322650/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JAMIA Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamiaopen/ooad047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/10/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooad047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

目的：识别新冠肺炎病例队列，包括仅在临床文本中提及病毒阳性证据，而不是在电子健康记录（EHR）的结构化实验室数据中提及的情况。材料和方法：对来自患者EHR中非结构化文本的特征表示进行统计分类器训练。我们使用新冠肺炎聚合酶链式反应（PCR）检测患者的代理数据集进行训练。我们根据代理数据集的性能选择了一个模型，并将其应用于未进行新冠肺炎PCR测试的实例。一位医生查看了这些实例的样本，以验证分类器。结果：在代理数据集的测试分割上，我们的最佳分类器对严重急性呼吸系统综合征冠状病毒2型阳性病例获得了0.56F1、0.6精度和0.52召回分数。在专家验证中，分类器正确识别出97.6%（81/84）为新冠肺炎阳性，97.8%（91/93）为非SARS-CoV2阳性。分类器将另外960例病例标记为未在医院进行SARS-CoV2实验室检测，其中只有177例病例具有COVID-19的ICD-10代码。讨论：代理数据集的性能可能更差，因为这些情况有时包括对未决实验室检测的讨论。最具预测性的特征是有意义和可解释的。很少提及进行的外部测试的类型。结论：在医院外进行检测的新冠肺炎病例可以从EHR的文本中可靠地检测到。在代理数据集上进行训练是开发高性能分类器的合适方法，而无需劳动密集型的标记工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital.

查看原文本刊更多论文

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital.

Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).

Materials and methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier.

Results: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19.

Discussion: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned.

Conclusion: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JAMIA Open Medicine-Health Informatics

CiteScore

4.10

自引率

4.80%

发文量

102

审稿时长

16 weeks