Sarah Riepenhausen, Max Blumenstock, Christian Niklas, Stefan Hegselmann, Philipp Neuhaus, Alexandra Meidt, Cornelia Püttmann, Michael Storck, Matthias Ganzinger, Julian Varghese, Martin Dugas
{"title":"Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations.","authors":"Sarah Riepenhausen, Max Blumenstock, Christian Niklas, Stefan Hegselmann, Philipp Neuhaus, Alexandra Meidt, Cornelia Püttmann, Michael Storck, Matthias Ganzinger, Julian Varghese, Martin Dugas","doi":"10.1055/s-0044-1786839","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Structural metadata from the majority of clinical studies and routine health care systems is currently not yet available to the scientific community.</p><p><strong>Objective: </strong>To provide an overview of available contents in the Portal of Medical Data Models (MDM Portal).</p><p><strong>Methods: </strong>The MDM Portal is a registered European information infrastructure for research and health care, and its contents are curated and semantically annotated by medical experts. It enables users to search, view, discuss, and download existing medical data models.</p><p><strong>Results: </strong>The most frequent keyword is \"clinical trial\" (<i>n</i> = 18,777), and the most frequent disease-specific keyword is \"breast neoplasms\" (<i>n</i> = 1,943). Most data items are available in English (<i>n</i> = 545,749) and German (<i>n</i> = 109,267). Manually curated semantic annotations are available for 805,308 elements (554,352 items, 58,101 item groups, and 192,855 code list items), which were derived from 25,257 data models. In total, 1,609,225 Unified Medical Language System (UMLS) codes have been assigned, with 66,373 unique UMLS codes.</p><p><strong>Conclusion: </strong>To our knowledge, the MDM Portal constitutes Europe's largest collection of medical data models with semantically annotated elements. As such, it can be used to increase compatibility of medical datasets and can be utilized as a large expert-annotated medical text corpus for natural language processing.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"52-61"},"PeriodicalIF":1.3000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495939/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/s-0044-1786839","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/13 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Structural metadata from the majority of clinical studies and routine health care systems is currently not yet available to the scientific community.
Objective: To provide an overview of available contents in the Portal of Medical Data Models (MDM Portal).
Methods: The MDM Portal is a registered European information infrastructure for research and health care, and its contents are curated and semantically annotated by medical experts. It enables users to search, view, discuss, and download existing medical data models.
Results: The most frequent keyword is "clinical trial" (n = 18,777), and the most frequent disease-specific keyword is "breast neoplasms" (n = 1,943). Most data items are available in English (n = 545,749) and German (n = 109,267). Manually curated semantic annotations are available for 805,308 elements (554,352 items, 58,101 item groups, and 192,855 code list items), which were derived from 25,257 data models. In total, 1,609,225 Unified Medical Language System (UMLS) codes have been assigned, with 66,373 unique UMLS codes.
Conclusion: To our knowledge, the MDM Portal constitutes Europe's largest collection of medical data models with semantically annotated elements. As such, it can be used to increase compatibility of medical datasets and can be utilized as a large expert-annotated medical text corpus for natural language processing.
期刊介绍:
Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.