Julia Gehrmann, Asme Dogan, Lea Hagelschuer, Lars Quakulinski, Anne Koy, Oya Beyan
{"title":"MedCAT的猫薄荷:优化临床变量自动SNOMED CT映射的输入。","authors":"Julia Gehrmann, Asme Dogan, Lea Hagelschuer, Lars Quakulinski, Anne Koy, Oya Beyan","doi":"10.3233/SHTI251390","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Mapping local medical data assets to international data standards such as medical ontology SNOMED CT fosters data harmonization and, thereby, global progress in medical research. Since its intense resource requirements often hinder manual SNOMED CT mapping, automated mapping tools such as MedCAT have been developed. We investigated how the formulation of study variable names (VNs) influences the efficacy and accuracy of the SNOMED CT concepts identified by MedCAT.</p><p><strong>Methods: </strong>We extracted 763 VNs from the GEPESTIM database hosted locally in REDCap and created three VNs using different REDCap metadata items for MedCAT-based SNOMED CT mapping. A fourth VN version was created manually. The mapping was evaluated based on the number and quality of identified SNOMED CT concepts, using manual scoring to assess concept accuracy while ensuring a blind evaluation process.</p><p><strong>Results: </strong>Increasing the expressiveness of VNs by adding more metadata items led to more SNOMED CT concepts being mapped, but also introduced mismatches, particularly when additionally included metadata contained misleading terms. The best overall mapping performance was achieved on the manually specified VNs while a basic VN version with minimal extra information from the metadata resulted in similarly good results.</p><p><strong>Conclusion: </strong>Our study identified key challenges in using MedCAT for automatically mapping study variables to SNOMED CT concepts. To improve accuracy, we recommend refining VNs reducing misleading terms and iteratively improving VN phrasing for optimal mapping outcome. Furthermore, it appears reasonable to always conduct a final manual review of the mapping outcome especially for critical variables and for those VNs containing negations or abbreviations.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"142-152"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Catnip for MedCAT: Optimizing the Input for Automated SNOMED CT Mapping of Clinical Variables.\",\"authors\":\"Julia Gehrmann, Asme Dogan, Lea Hagelschuer, Lars Quakulinski, Anne Koy, Oya Beyan\",\"doi\":\"10.3233/SHTI251390\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Mapping local medical data assets to international data standards such as medical ontology SNOMED CT fosters data harmonization and, thereby, global progress in medical research. Since its intense resource requirements often hinder manual SNOMED CT mapping, automated mapping tools such as MedCAT have been developed. We investigated how the formulation of study variable names (VNs) influences the efficacy and accuracy of the SNOMED CT concepts identified by MedCAT.</p><p><strong>Methods: </strong>We extracted 763 VNs from the GEPESTIM database hosted locally in REDCap and created three VNs using different REDCap metadata items for MedCAT-based SNOMED CT mapping. A fourth VN version was created manually. The mapping was evaluated based on the number and quality of identified SNOMED CT concepts, using manual scoring to assess concept accuracy while ensuring a blind evaluation process.</p><p><strong>Results: </strong>Increasing the expressiveness of VNs by adding more metadata items led to more SNOMED CT concepts being mapped, but also introduced mismatches, particularly when additionally included metadata contained misleading terms. The best overall mapping performance was achieved on the manually specified VNs while a basic VN version with minimal extra information from the metadata resulted in similarly good results.</p><p><strong>Conclusion: </strong>Our study identified key challenges in using MedCAT for automatically mapping study variables to SNOMED CT concepts. To improve accuracy, we recommend refining VNs reducing misleading terms and iteratively improving VN phrasing for optimal mapping outcome. Furthermore, it appears reasonable to always conduct a final manual review of the mapping outcome especially for critical variables and for those VNs containing negations or abbreviations.</p>\",\"PeriodicalId\":94357,\"journal\":{\"name\":\"Studies in health technology and informatics\",\"volume\":\"331 \",\"pages\":\"142-152\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in health technology and informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/SHTI251390\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Catnip for MedCAT: Optimizing the Input for Automated SNOMED CT Mapping of Clinical Variables.
Introduction: Mapping local medical data assets to international data standards such as medical ontology SNOMED CT fosters data harmonization and, thereby, global progress in medical research. Since its intense resource requirements often hinder manual SNOMED CT mapping, automated mapping tools such as MedCAT have been developed. We investigated how the formulation of study variable names (VNs) influences the efficacy and accuracy of the SNOMED CT concepts identified by MedCAT.
Methods: We extracted 763 VNs from the GEPESTIM database hosted locally in REDCap and created three VNs using different REDCap metadata items for MedCAT-based SNOMED CT mapping. A fourth VN version was created manually. The mapping was evaluated based on the number and quality of identified SNOMED CT concepts, using manual scoring to assess concept accuracy while ensuring a blind evaluation process.
Results: Increasing the expressiveness of VNs by adding more metadata items led to more SNOMED CT concepts being mapped, but also introduced mismatches, particularly when additionally included metadata contained misleading terms. The best overall mapping performance was achieved on the manually specified VNs while a basic VN version with minimal extra information from the metadata resulted in similarly good results.
Conclusion: Our study identified key challenges in using MedCAT for automatically mapping study variables to SNOMED CT concepts. To improve accuracy, we recommend refining VNs reducing misleading terms and iteratively improving VN phrasing for optimal mapping outcome. Furthermore, it appears reasonable to always conduct a final manual review of the mapping outcome especially for critical variables and for those VNs containing negations or abbreviations.