Using machine learning to standardize medication records in a pan-Canadian electronic medical record database: a data-driven algorithm study focused on antibiotics prescribed in primary care.

CMAJ open Pub Date : 2023-10-31 Print Date: 2023-09-01 DOI:10.9778/cmajo.20220235

Stephanie Garies, Matt Taylor, Boglarka Soos, Cliff Lindeman, Neil Drummond, Anh Pham, Zhi Aponte-Hao, Tyler Williamson

{"title":"Using machine learning to standardize medication records in a pan-Canadian electronic medical record database: a data-driven algorithm study focused on antibiotics prescribed in primary care.","authors":"Stephanie Garies, Matt Taylor, Boglarka Soos, Cliff Lindeman, Neil Drummond, Anh Pham, Zhi Aponte-Hao, Tyler Williamson","doi":"10.9778/cmajo.20220235","DOIUrl":null,"url":null,"abstract":"Background: Most antibiotics dispensed by community pharmacies in Canada are prescribed by family physicians, but using the prescribing information contained within primary care electronic medical records (EMRs) for secondary purposes can be challenging owing to variable data quality. We used antibiotic medications as an exemplar to validate a machine-learning approach for cleaning and coding medication data in a pan-Canadian primary care EMR database.Methods: The Canadian Primary Care Sentinel Surveillance Network database contained an estimated 42 million medication records, which we mapped to an Anatomic Therapeutic Chemical (ATC) code by applying a semisupervised classification model developed using reference standard labels derived from the Health Canada Drug Product Database. We validated the resulting ATC codes in a subset of antibiotic records (16 119 unique strings) to determine whether the algorithm correctly classified the medication according to manual review of the original medication record.Results: In the antibiotic subset, the algorithm showed high validity (sensitivity 99.5%, specificity 92.4%, positive predictive value 98.6%, negative predictive value 97.0%) in classifying whether the medication was an antibiotic.Interpretation: Our machine-learning algorithm classified unstructured antibiotic medication data from primary care with a high degree of accuracy. Access to cleaned EMR data can support important secondary uses, including community-based antibiotic prescribing surveillance and practice improvement.","PeriodicalId":93946,"journal":{"name":"CMAJ open","volume":"11 5","pages":"E1020-E1024"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10620009/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CMAJ open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.9778/cmajo.20220235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/1 0:00:00","PubModel":"Print","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Most antibiotics dispensed by community pharmacies in Canada are prescribed by family physicians, but using the prescribing information contained within primary care electronic medical records (EMRs) for secondary purposes can be challenging owing to variable data quality. We used antibiotic medications as an exemplar to validate a machine-learning approach for cleaning and coding medication data in a pan-Canadian primary care EMR database.

Methods: The Canadian Primary Care Sentinel Surveillance Network database contained an estimated 42 million medication records, which we mapped to an Anatomic Therapeutic Chemical (ATC) code by applying a semisupervised classification model developed using reference standard labels derived from the Health Canada Drug Product Database. We validated the resulting ATC codes in a subset of antibiotic records (16 119 unique strings) to determine whether the algorithm correctly classified the medication according to manual review of the original medication record.

Results: In the antibiotic subset, the algorithm showed high validity (sensitivity 99.5%, specificity 92.4%, positive predictive value 98.6%, negative predictive value 97.0%) in classifying whether the medication was an antibiotic.

Interpretation: Our machine-learning algorithm classified unstructured antibiotic medication data from primary care with a high degree of accuracy. Access to cleaned EMR data can support important secondary uses, including community-based antibiotic prescribing surveillance and practice improvement.

查看原文本刊更多论文

使用机器学习对泛加拿大电子病历数据库中的药物记录进行标准化：一项数据驱动的算法研究，重点关注初级保健中处方的抗生素。

背景：加拿大社区药房发放的大多数抗生素都是由家庭医生开具的，但由于数据质量参差不齐，将初级保健电子医疗记录中包含的处方信息用于次要目的可能具有挑战性。我们以抗生素药物为例，验证了在泛加拿大初级保健电子病历数据库中清洁和编码药物数据的机器学习方法。方法：加拿大初级保健哨兵监测网络数据库包含约4200万份药物记录，我们通过应用半监督分类模型将其映射到解剖治疗化学（ATC）代码，该模型使用加拿大卫生部药品数据库中的参考标准标签开发。我们在抗生素记录的子集（16119个唯一字符串）中验证了产生的ATC代码，以确定算法是否根据对原始药物记录的手动审查正确地对药物进行了分类。结果：在抗生素子集中，该算法在分类药物是否为抗生素方面显示出较高的有效性（敏感性99.5%，特异性92.4%，阳性预测值98.6%，阴性预测值97.0%）。解释：我们的机器学习算法对来自初级保健的非结构化抗生素药物数据进行了高度准确的分类。获得清洁的电子病历数据可以支持重要的二次使用，包括基于社区的抗生素处方监测和实践改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

CMAJ open

CiteScore

5.40

自引率

0.00%

发文量