增强医疗保健数据集成:协调实验室标签的机器学习方法。

Mehmet F Bagci, Samantha R Spierling, Anna L Ritko, Truong Nguyen, Brian D Modena, Yusuf Ozturk
{"title":"增强医疗保健数据集成:协调实验室标签的机器学习方法。","authors":"Mehmet F Bagci, Samantha R Spierling, Anna L Ritko, Truong Nguyen, Brian D Modena, Yusuf Ozturk","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Variations in laboratory test names across healthcare systems-stemming from inconsistent terminologies, abbreviations, misspellings, and assay vendors-pose significant challenges to the integration and analysis of clinical data. These discrepancies hinder interoperability and complicate efforts to extract meaningful insights for both clinical research and patient care. In this study, we propose a machine learning-driven solution, enhanced by natural language processing techniques, to standardize lab test names. By employing feature extraction methods that analyze both string similarity and the distributional properties of test results, we improve the harmonization of test names, resulting in a more robust dataset. Our model achieves a 99% accuracy rate in matching lab names, showcasing the potential of AI-driven approaches in resolving long-standing standardization challenges. Importantly, this method enhances the reliability and consistency of clinical data, which is crucial for ensuring accurate results in large-scale clinical studies and improving the overall efficiency of informatics-based research and diagnostics.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"65-73"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150698/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhancing Healthcare Data Integration: A Machine Learning Approach to Harmonizing Laboratory Labels.\",\"authors\":\"Mehmet F Bagci, Samantha R Spierling, Anna L Ritko, Truong Nguyen, Brian D Modena, Yusuf Ozturk\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Variations in laboratory test names across healthcare systems-stemming from inconsistent terminologies, abbreviations, misspellings, and assay vendors-pose significant challenges to the integration and analysis of clinical data. These discrepancies hinder interoperability and complicate efforts to extract meaningful insights for both clinical research and patient care. In this study, we propose a machine learning-driven solution, enhanced by natural language processing techniques, to standardize lab test names. By employing feature extraction methods that analyze both string similarity and the distributional properties of test results, we improve the harmonization of test names, resulting in a more robust dataset. Our model achieves a 99% accuracy rate in matching lab names, showcasing the potential of AI-driven approaches in resolving long-standing standardization challenges. Importantly, this method enhances the reliability and consistency of clinical data, which is crucial for ensuring accurate results in large-scale clinical studies and improving the overall efficiency of informatics-based research and diagnostics.</p>\",\"PeriodicalId\":72181,\"journal\":{\"name\":\"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science\",\"volume\":\"2025 \",\"pages\":\"65-73\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150698/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

医疗保健系统中实验室检测名称的变化(源于不一致的术语、缩写、拼写错误和检测供应商)对临床数据的整合和分析构成了重大挑战。这些差异阻碍了互操作性,并使提取临床研究和患者护理有意义的见解的努力复杂化。在本研究中,我们提出了一种由自然语言处理技术增强的机器学习驱动的解决方案,以标准化实验室测试名称。通过使用分析字符串相似性和测试结果分布特性的特征提取方法,我们提高了测试名称的协调性,从而获得更健壮的数据集。我们的模型在匹配实验室名称方面达到了99%的准确率,展示了人工智能驱动方法在解决长期存在的标准化挑战方面的潜力。重要的是,该方法增强了临床数据的可靠性和一致性,这对于确保大规模临床研究结果的准确性和提高基于信息学的研究和诊断的整体效率至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing Healthcare Data Integration: A Machine Learning Approach to Harmonizing Laboratory Labels.

Variations in laboratory test names across healthcare systems-stemming from inconsistent terminologies, abbreviations, misspellings, and assay vendors-pose significant challenges to the integration and analysis of clinical data. These discrepancies hinder interoperability and complicate efforts to extract meaningful insights for both clinical research and patient care. In this study, we propose a machine learning-driven solution, enhanced by natural language processing techniques, to standardize lab test names. By employing feature extraction methods that analyze both string similarity and the distributional properties of test results, we improve the harmonization of test names, resulting in a more robust dataset. Our model achieves a 99% accuracy rate in matching lab names, showcasing the potential of AI-driven approaches in resolving long-standing standardization challenges. Importantly, this method enhances the reliability and consistency of clinical data, which is crucial for ensuring accurate results in large-scale clinical studies and improving the overall efficiency of informatics-based research and diagnostics.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信