一种用于半自动评估药物价值集正确性的数据驱动迭代方法:基于阿片类药物的概念验证。

IF 1.8 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Methods of Information in Medicine Pub Date : 2021-12-01 Epub Date: 2021-12-29 DOI:10.1055/s-0041-1740358

Linyi Li, Adela Grando, Abeed Sarker

{"title":"一种用于半自动评估药物价值集正确性的数据驱动迭代方法:基于阿片类药物的概念验证。","authors":"Linyi Li, Adela Grando, Abeed Sarker","doi":"10.1055/s-0041-1740358","DOIUrl":null,"url":null,"abstract":"Background: Value sets are lists of terms (e.g., opioid medication names) and their corresponding codes from standard clinical vocabularies (e.g., RxNorm) created with the intent of supporting health information exchange and research. Value sets are manually-created and often exhibit errors.Objectives: The aim of the study is to develop a semi-automatic, data-centric natural language processing (NLP) method to assess medication-related value set correctness and evaluate it on a set of opioid medication value sets.Methods: We developed an NLP algorithm that utilizes value sets containing mostly true positives and true negatives to learn lexical patterns associated with the true positives, and then employs these patterns to identify potential errors in unseen value sets. We evaluated the algorithm on a set of opioid medication value sets, using the recall, precision and F1-score metrics. We applied the trained model to assess the correctness of unseen opioid value sets based on recall. To replicate the application of the algorithm in real-world settings, a domain expert manually conducted error analysis to identify potential system and value set errors.Results: Thirty-eight value sets were retrieved from the Value Set Authority Center, and six (two opioid, four non-opioid) were used to develop and evaluate the system. Average precision, recall, and F1-score were 0.932, 0.904, and 0.909, respectively on uncorrected value sets; and 0.958, 0.953, and 0.953, respectively after manual correction of the same value sets. On 20 unseen opioid value sets, the algorithm obtained average recall of 0.89. Error analyses revealed that the main sources of system misclassifications were differences in how opioids were coded in the value sets-while the training value sets had generic names mostly, some of the unseen value sets had new trade names and ingredients.Conclusion: The proposed approach is data-centric, reusable, customizable, and not resource intensive. It may help domain experts to easily validate value sets.","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"e111-e119"},"PeriodicalIF":1.8000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/ac/c9/10-1055-s-0041-1740358.PMC8716187.pdf","citationCount":"0","resultStr":"{\"title\":\"A Data-Driven Iterative Approach for Semi-automatically Assessing the Correctness of Medication Value Sets: A Proof of Concept Based on Opioids.\",\"authors\":\"Linyi Li, Adela Grando, Abeed Sarker\",\"doi\":\"10.1055/s-0041-1740358\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Value sets are lists of terms (e.g., opioid medication names) and their corresponding codes from standard clinical vocabularies (e.g., RxNorm) created with the intent of supporting health information exchange and research. Value sets are manually-created and often exhibit errors.Objectives: The aim of the study is to develop a semi-automatic, data-centric natural language processing (NLP) method to assess medication-related value set correctness and evaluate it on a set of opioid medication value sets.Methods: We developed an NLP algorithm that utilizes value sets containing mostly true positives and true negatives to learn lexical patterns associated with the true positives, and then employs these patterns to identify potential errors in unseen value sets. We evaluated the algorithm on a set of opioid medication value sets, using the recall, precision and F1-score metrics. We applied the trained model to assess the correctness of unseen opioid value sets based on recall. To replicate the application of the algorithm in real-world settings, a domain expert manually conducted error analysis to identify potential system and value set errors.Results: Thirty-eight value sets were retrieved from the Value Set Authority Center, and six (two opioid, four non-opioid) were used to develop and evaluate the system. Average precision, recall, and F1-score were 0.932, 0.904, and 0.909, respectively on uncorrected value sets; and 0.958, 0.953, and 0.953, respectively after manual correction of the same value sets. On 20 unseen opioid value sets, the algorithm obtained average recall of 0.89. Error analyses revealed that the main sources of system misclassifications were differences in how opioids were coded in the value sets-while the training value sets had generic names mostly, some of the unseen value sets had new trade names and ingredients.Conclusion: The proposed approach is data-centric, reusable, customizable, and not resource intensive. It may help domain experts to easily validate value sets.\",\"PeriodicalId\":49822,\"journal\":{\"name\":\"Methods of Information in Medicine\",\"volume\":\" \",\"pages\":\"e111-e119\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/ac/c9/10-1055-s-0041-1740358.PMC8716187.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods of Information in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1055/s-0041-1740358\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2021/12/29 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/s-0041-1740358","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/12/29 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

背景:值集是术语列表(例如，阿片类药物名称)及其相应的标准临床词汇表(例如，RxNorm)代码，旨在支持卫生信息交换和研究。值集是手动创建的，经常会出现错误。目的:本研究旨在开发一种半自动、以数据为中心的自然语言处理(NLP)方法来评估药物相关值集的正确性，并在一组阿片类药物药物值集上进行评估。方法:我们开发了一种NLP算法，该算法利用包含真阳性和真阴性的值集来学习与真阳性相关的词汇模式，然后使用这些模式来识别未见值集中的潜在错误。我们在一组阿片类药物值集上评估了该算法，使用召回率、精度和f1评分指标。我们应用训练好的模型来评估基于召回的未见阿片类药物价值集的正确性。为了在现实环境中复制该算法的应用，领域专家手动进行了错误分析，以识别潜在的系统和值集错误。结果:从值集权威中心检索到38个值集，其中6个(2个阿片类药物，4个非阿片类药物)用于开发和评估系统。未校正值集的平均精密度、召回率和f1得分分别为0.932、0.904和0.909;同值集人工校正后分别为0.958、0.953、0.953。在20个未见的阿片类药物值集上，该算法的平均召回率为0.89。错误分析表明，系统错误分类的主要来源是阿片类药物在值集中编码方式的差异——虽然训练值集大多具有通用名称，但一些未见过的值集具有新的商品名称和成分。结论:所建议的方法是以数据为中心的、可重用的、可定制的，而且资源不密集。它可以帮助领域专家轻松地验证值集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A Data-Driven Iterative Approach for Semi-automatically Assessing the Correctness of Medication Value Sets: A Proof of Concept Based on Opioids.

查看原文本刊更多论文

A Data-Driven Iterative Approach for Semi-automatically Assessing the Correctness of Medication Value Sets: A Proof of Concept Based on Opioids.

Background: Value sets are lists of terms (e.g., opioid medication names) and their corresponding codes from standard clinical vocabularies (e.g., RxNorm) created with the intent of supporting health information exchange and research. Value sets are manually-created and often exhibit errors.

Objectives: The aim of the study is to develop a semi-automatic, data-centric natural language processing (NLP) method to assess medication-related value set correctness and evaluate it on a set of opioid medication value sets.

Methods: We developed an NLP algorithm that utilizes value sets containing mostly true positives and true negatives to learn lexical patterns associated with the true positives, and then employs these patterns to identify potential errors in unseen value sets. We evaluated the algorithm on a set of opioid medication value sets, using the recall, precision and F₁-score metrics. We applied the trained model to assess the correctness of unseen opioid value sets based on recall. To replicate the application of the algorithm in real-world settings, a domain expert manually conducted error analysis to identify potential system and value set errors.

Results: Thirty-eight value sets were retrieved from the Value Set Authority Center, and six (two opioid, four non-opioid) were used to develop and evaluate the system. Average precision, recall, and F₁-score were 0.932, 0.904, and 0.909, respectively on uncorrected value sets; and 0.958, 0.953, and 0.953, respectively after manual correction of the same value sets. On 20 unseen opioid value sets, the algorithm obtained average recall of 0.89. Error analyses revealed that the main sources of system misclassifications were differences in how opioids were coded in the value sets-while the training value sets had generic names mostly, some of the unseen value sets had new trade names and ingredients.

Conclusion: The proposed approach is data-centric, reusable, customizable, and not resource intensive. It may help domain experts to easily validate value sets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Methods of Information in Medicine 医学-计算机：信息系统

CiteScore

3.70

自引率

11.80%

发文量

审稿时长

6-12 weeks

期刊介绍： Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.