A Novel Sentence Transformer-based Natural Language Processing Approach for Schema Mapping of Electronic Health Records to the OMOP Common Data Model.

AMIA ... Annual Symposium proceedings. AMIA Symposium Pub Date : 2025-05-22 eCollection Date: 2024-01-01

Xinyu Zhou, Lovedeep Singh Dhingra, Arya Aminorroaya, Philip Adejumo, Rohan Khera

{"title":"A Novel Sentence Transformer-based Natural Language Processing Approach for Schema Mapping of Electronic Health Records to the OMOP Common Data Model.","authors":"Xinyu Zhou, Lovedeep Singh Dhingra, Arya Aminorroaya, Philip Adejumo, Rohan Khera","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Mapping electronic health records (EHR) data to common data models (CDMs) enables the standardization of clinical records, enhancing interoperability and enabling large-scale, multi-centered clinical investigations. Using 2 large publicly available datasets, we developed transformer-based natural language processing models to map medication-related concepts from the EHR at a large and diverse healthcare system to standard concepts in OMOP CDM. We validated the model outputs against standard concepts manually mapped by clinicians. Our best model reached out-of-box accuracies of 96.5% in mapping the 200 most common drugs and 83.0% in mapping 200 random drugs in the EHR. For these tasks, this model outperformed a state-of-the-art large language model (SFR-Embedding-Mistral, 89.5% and 66.5% in accuracy for the two tasks), a widely used software for schema mapping (Usagi, 90.0% and 70.0% in accuracy), and direct string match (7.5% and 7.5% accuracy). Transformer-based deep learning models outperform existing approaches in the standardized mapping of EHR elements and can facilitate an end-to-end automated EHR transformation pipeline.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2024 ","pages":"1332-1339"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12099400/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA ... Annual Symposium proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Mapping electronic health records (EHR) data to common data models (CDMs) enables the standardization of clinical records, enhancing interoperability and enabling large-scale, multi-centered clinical investigations. Using 2 large publicly available datasets, we developed transformer-based natural language processing models to map medication-related concepts from the EHR at a large and diverse healthcare system to standard concepts in OMOP CDM. We validated the model outputs against standard concepts manually mapped by clinicians. Our best model reached out-of-box accuracies of 96.5% in mapping the 200 most common drugs and 83.0% in mapping 200 random drugs in the EHR. For these tasks, this model outperformed a state-of-the-art large language model (SFR-Embedding-Mistral, 89.5% and 66.5% in accuracy for the two tasks), a widely used software for schema mapping (Usagi, 90.0% and 70.0% in accuracy), and direct string match (7.5% and 7.5% accuracy). Transformer-based deep learning models outperform existing approaches in the standardized mapping of EHR elements and can facilitate an end-to-end automated EHR transformation pipeline.

本刊更多论文

基于句子转换器的电子健康记录到OMOP公共数据模型模式映射的自然语言处理方法。

将电子健康记录（EHR）数据映射到公共数据模型（cdm）可以实现临床记录的标准化，增强互操作性并实现大规模、多中心的临床调查。使用2个大型公开可用的数据集，我们开发了基于转换器的自然语言处理模型，将大型多样化医疗保健系统中的EHR中与药物相关的概念映射到OMOP CDM中的标准概念。我们根据临床医生手动映射的标准概念验证了模型输出。我们的最佳模型在绘制200种最常见药物的图谱时达到了96.5%的开箱外准确率，在绘制200种随机药物的图谱时达到了83.0%。对于这些任务，该模型优于最先进的大型语言模型（sr - embedging - mistral，两个任务的准确率分别为89.5%和66.5%）、广泛使用的模式映射软件（Usagi，准确率分别为90.0%和70.0%）和直接字符串匹配（准确率分别为7.5%和7.5%）。基于转换器的深度学习模型在EHR元素的标准化映射方面优于现有方法，并且可以促进端到端的自动化EHR转换管道。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AMIA ... Annual Symposium proceedings. AMIA Symposium

自引率

0.00%

发文量