Matthias Hüser , John Doole , Vinicius Pinho , Hossein Rouhizadeh , Douglas Teodoro , Ahson Saiyed , Matvey B. Palchuk
{"title":"A machine learning approach for automating review of a RxNorm medication mapping pipeline output","authors":"Matthias Hüser , John Doole , Vinicius Pinho , Hossein Rouhizadeh , Douglas Teodoro , Ahson Saiyed , Matvey B. Palchuk","doi":"10.1016/j.jbi.2025.104909","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective:</h3><div>Medication mapping to standardized terminologies is an important prerequisite for performing analytics on a federated EHR network. TriNetX LLC operates the largest such network in the world.</div></div><div><h3>Methods:</h3><div>Here we report on a novel pipeline, called <span>RxEmbed</span>, for the mapping and binding of local medication descriptions to RxNorm ingredient codes, using LLMs, and automated mapping review using machine learning.</div></div><div><h3>Results:</h3><div>Performance of <span>RxEmbed</span> was assessed in a public data set from France as well as 6 Healthcare Organizations from the TriNetX federated EHR network across the United States and Brazil. On the public data set, <span>RxEmbed</span> outperformed two recently reported LLM-based baselines in terms of recall, and precision of generated mappings. In TriNetX network data, <span>RxEmbed</span> obtained RxNorm mapping recalls of 84%–93%, at a precision of 99.5%–100%.</div></div><div><h3>Conclusion:</h3><div>We built and evaluated a LLM-based medication mapping pipeline, that binds local medication descriptions from EHR systems to RxNorm ingredient codes. The high precision of the pipeline output implies very limited need for human review of the generated mappings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104909"},"PeriodicalIF":4.5000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425001388","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective:
Medication mapping to standardized terminologies is an important prerequisite for performing analytics on a federated EHR network. TriNetX LLC operates the largest such network in the world.
Methods:
Here we report on a novel pipeline, called RxEmbed, for the mapping and binding of local medication descriptions to RxNorm ingredient codes, using LLMs, and automated mapping review using machine learning.
Results:
Performance of RxEmbed was assessed in a public data set from France as well as 6 Healthcare Organizations from the TriNetX federated EHR network across the United States and Brazil. On the public data set, RxEmbed outperformed two recently reported LLM-based baselines in terms of recall, and precision of generated mappings. In TriNetX network data, RxEmbed obtained RxNorm mapping recalls of 84%–93%, at a precision of 99.5%–100%.
Conclusion:
We built and evaluated a LLM-based medication mapping pipeline, that binds local medication descriptions from EHR systems to RxNorm ingredient codes. The high precision of the pipeline output implies very limited need for human review of the generated mappings.
期刊介绍:
The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.