{"title":"TCMSF: A Construction Framework of Traditional Chinese Medicine Syndrome Ancient Book Knowledge Graph.","authors":"Ziling Zeng, Lin Tong, Bing Li, Wenjing Zong, Qikai Niu, Sihong Liu, Lei Zhang, Jialun Wang, Siqi Zhang, Siwei Tian, Jing'ai Wang, Wei Zhang, Huamin Zhang","doi":"10.1055/a-2590-6348","DOIUrl":"10.1055/a-2590-6348","url":null,"abstract":"<p><p>Syndrome is a unique and crucial concept in traditional Chinese medicine (TCM). However, much of the syndrome knowledge lacks systematic organization and correlation, and current information technologies are unsuitable for TCM ancient texts.We aimed to develop a knowledge graph that presents this knowledge in a more orderly, structured, and semantically oriented manner, providing a foundation for computer-aided diagnosis and treatment.We developed a construction framework of TCM syndrome knowledge from ancient books, using a pretrained model and rules (TCMSF). We conducted fine-tuning training on Enhanced Representation through Knowledge Integration (ERNIE), Bidirectional Encoder Representation from Transformers pretrained language models, and chatGLM3-6b large language models for named entity recognition (NER) tasks. Furthermore, we employed the progressive entity relationship extraction method based on the dual pattern feature combination to extract and standardize entities and relationships between entities in these books.We selected Yin deficiency syndrome as a case study and constructed a model layer suitable for the expression of knowledge in these books. Compared with multiple NER methods, the combination of ERNIE and Conditional Random Fields performs the best. By utilizing this combination, we completed the entity extraction of Yin deficiency syndrome, achieving an average F1 value of 0.77. The relationship extraction method we proposed reduces the number of incorrectly connected relationships compared with fully connected pattern layers. We successfully constructed a knowledge graph of ancient books on Yin deficiency syndrome, including over 120,000 entities and over 1.18 million relationships.We developed TCMSF in line with the knowledge characteristics of ancient TCM books and improved the accuracy of knowledge graph construction.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144036734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spencer Krichevsky, Evan T Sholle, Prakash M Adekkanattu, Sajjad Abedian, Madhu Ouseph, Elwood Taylor, Ghaith Abu-Zeinah, Diana Jaber, Claudia Sosner, Marika M Cusick, Niamh Savage, Richard T Silver, Joseph M Scandura, Thomas R Campion
{"title":"Automated Information Extraction from Unstructured Hematopathology Reports to Support Response Assessment in Myeloproliferative Neoplasms.","authors":"Spencer Krichevsky, Evan T Sholle, Prakash M Adekkanattu, Sajjad Abedian, Madhu Ouseph, Elwood Taylor, Ghaith Abu-Zeinah, Diana Jaber, Claudia Sosner, Marika M Cusick, Niamh Savage, Richard T Silver, Joseph M Scandura, Thomas R Campion","doi":"10.1055/a-2590-6456","DOIUrl":"https://doi.org/10.1055/a-2590-6456","url":null,"abstract":"<p><p>Assessing treatment response in patients with myeloproliferative neoplasms is difficult because data components exist in unstructured bone marrow pathology (hematopathology) reports, which require specialized, manual annotation, and interpretation. Although natural language processing (NLP) has been successfully implemented for the extraction of features from solid tumor reports, little is known about its application to hematopathology.An open-source NLP framework called Leo was implemented to parse document segments and extract concept phrases utilized for assessing responses in myeloproliferative neoplasms. A reference standard was generated through the manual review of hematopathology notes.Compared with a reference standard (<i>n</i> = 300 reports), our NLP method extracted features such as aspirate myeloblasts (F1 = 98%) and biopsy reticulin fibrosis (F1 = 93%) with high accuracy. However, other values, such as myeloblasts from the biopsy (F1 = 6%) and via flow cytometry (F1 = 8%), were affected by sparsity representative of reporting conventions. The four features with the highest clinical importance were extracted with F1 scores exceeding 90%. Whereas manual annotation of 300 reports required 30 hours of staff effort, automated NLP required 3.5 hours of runtime for 34,301 reports.To the best of our knowledge, this is among the first studies to demonstrate the application of NLP to hematopathology for clinical feature extraction. The approach may inform efforts at other institutions, and the code is available at https://github.com/wcmc-research-informatics/BmrExtractor.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144054744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zixin Shu, Rui Hua, Dengying Yan, Chenxia Lu, Meng Ren, Hong Gao, Ning Xu, Jun Li, Hui Zhu, Jia Zhang, Dan Zhao, Chenyang Hui, Chu Liao, Junqiu Ye, Qi Hao, Xinyan Wang, Xiaodong Li, Baoyan Liu, Xiaji Zhou, Runshun Zhang, Min Xu, Xuezhong Zhou
{"title":"ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data.","authors":"Zixin Shu, Rui Hua, Dengying Yan, Chenxia Lu, Meng Ren, Hong Gao, Ning Xu, Jun Li, Hui Zhu, Jia Zhang, Dan Zhao, Chenyang Hui, Chu Liao, Junqiu Ye, Qi Hao, Xinyan Wang, Xiaodong Li, Baoyan Liu, Xiaji Zhou, Runshun Zhang, Min Xu, Xuezhong Zhou","doi":"10.1055/a-2576-1847","DOIUrl":"https://doi.org/10.1055/a-2576-1847","url":null,"abstract":"<p><p>Symptom phenotypes are crucial for diagnosing and treating various disease conditions. However, the diversity of symptom terminologies poses a significant challenge to analyzing and sharing of symptom-related medical data, particularly in the field of traditional Chinese medicine (TCM). This study aims to construct an Integrated Symptom Phenotype Ontology (ISPO) to support data mining of Chinese electronic medical records (EMRs) and real-world studies in the TCM field.We manually annotated and extracted symptom terms from 21 classical TCM textbooks and 78,696 inpatient EMRs, and integrated them with five publicly available symptom-related biomedical vocabularies. Through a human-machine collaborative approach for terminology editing and ontology development, including term screening, semantic mapping, and concept classification, we constructed a high-quality symptom ontology that integrates both TCM and Western medical terminology.ISPO provides 3,147 concepts, 23,475 terms, and 23,363 hierarchical relationships. Compared with international symptom-related ontologies such as the Symptom Ontology, ISPO offers significant improvements in the number of terms and synonymous relationships. Furthermore, evaluation across three independent curated clinical datasets demonstrated that ISPO achieved over 90% coverage of symptom terms, highlighting its strong clinical usability and completeness.ISPO represents the first clinical ontology globally dedicated to the systematic representation of symptoms. It integrates symptom terminologies from historical and contemporary sources, encompassing both TCM and Western medicine, thereby enhancing semantic interoperability across heterogeneous medical data sources and clinical decision support systems in TCM.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144022609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heidelinde Dehaene, Alexander Decruyenaere, Christiaan Polet, Johan Decruyenaere, Paloma Rabaey, Thomas Demeester, Stijn Vansteelandt
{"title":"Why Synthetic Discoveries are Not Only a Problem of Differentially Private Synthetic Data.","authors":"Heidelinde Dehaene, Alexander Decruyenaere, Christiaan Polet, Johan Decruyenaere, Paloma Rabaey, Thomas Demeester, Stijn Vansteelandt","doi":"10.1055/a-2540-8284","DOIUrl":"https://doi.org/10.1055/a-2540-8284","url":null,"abstract":"","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144036735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ileana Montoya Perez, Parisa Movahedi, Valtteri Nieminen, Antti Airola, Tapio Pahikkala
{"title":"Response to Commentary by Dehaene et al. on Synthetic Discovery is not only a Problem of Differentially Private Synthetic Data.","authors":"Ileana Montoya Perez, Parisa Movahedi, Valtteri Nieminen, Antti Airola, Tapio Pahikkala","doi":"10.1055/a-2540-8346","DOIUrl":"https://doi.org/10.1055/a-2540-8346","url":null,"abstract":"","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143993782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ismat Mohd Sulaiman, Awang Bulgiba, Sameem Abdul Kareem, Abdul Aziz Latip
{"title":"Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning.","authors":"Ismat Mohd Sulaiman, Awang Bulgiba, Sameem Abdul Kareem, Abdul Aziz Latip","doi":"10.1055/a-2521-4372","DOIUrl":"10.1055/a-2521-4372","url":null,"abstract":"<p><strong>Objective: </strong> This is the first Malaysian machine learning model to detect and disambiguate abbreviations in clinical notes. The model has been designed to be incorporated into MyHarmony, a natural language processing system, that extracts clinical information for health care management. The model utilizes word embedding to ensure feasibility of use, not in real-time but for secondary analysis, within the constraints of low-resource settings.</p><p><strong>Methods: </strong> A Malaysian clinical embedding, based on Word2Vec model, was developed using 29,895 electronic discharge summaries. The embedding was compared against conventional rule-based and FastText embedding on two tasks: abbreviation detection and abbreviation disambiguation. Machine learning classifiers were applied to assess performance.</p><p><strong>Results: </strong> The Malaysian clinical word embedding contained 7 million word tokens, 24,352 unique vocabularies, and 100 dimensions. For abbreviation detection, the Decision Tree classifier augmented with the Malaysian clinical embedding showed the best performance (F-score of 0.9519). For abbreviation disambiguation, the classifier with the Malaysian clinical embedding had the best performance for most of the abbreviations (F-score of 0.9903).</p><p><strong>Conclusion: </strong> Despite having a smaller vocabulary and dimension, our local clinical word embedding performed better than the larger nonclinical FastText embedding. Word embedding with simple machine learning algorithms can decipher abbreviations well. It also requires lower computational resources and is suitable for implementation in low-resource settings such as Malaysia. The integration of this model into MyHarmony will improve recognition of clinical terms, thus improving the information generated for monitoring Malaysian health care services and policymaking.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuntaro Yada, Yuta Nakamura, Shoko Wakamiya, Eiji Aramaki
{"title":"Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop.","authors":"Shuntaro Yada, Yuta Nakamura, Shoko Wakamiya, Eiji Aramaki","doi":"10.1055/a-2405-2489","DOIUrl":"10.1055/a-2405-2489","url":null,"abstract":"<p><strong>Background: </strong> Textual datasets (corpora) are crucial for the application of natural language processing (NLP) models. However, corpus creation in the medical field is challenging, primarily because of privacy issues with raw clinical data such as health records. Thus, the existing clinical corpora are generally small and scarce. Medical NLP (MedNLP) methodologies perform well with limited data availability.</p><p><strong>Objectives: </strong> We present the outcomes of the Real-MedNLP workshop, which was conducted using limited and parallel medical corpora. Real-MedNLP exhibits three distinct characteristics: (1) limited annotated documents: the training data comprise only a small set (∼100) of case reports (CRs) and radiology reports (RRs) that have been annotated. (2) Bilingually parallel: the constructed corpora are parallel in Japanese and English. (3) Practical tasks: the workshop addresses fundamental tasks, such as named entity recognition (NER) and applied practical tasks.</p><p><strong>Methods: </strong> We propose three tasks: NER of ∼100 available documents (Task 1), NER based only on annotation guidelines for humans (Task 2), and clinical applications (Task 3) consisting of adverse drug effect (ADE) detection for CRs and identical case identification (CI) for RRs.</p><p><strong>Results: </strong> Nine teams participated in this study. The best systems achieved 0.65 and 0.89 F1-scores for CRs and RRs in Task 1, whereas the top scores in Task 2 decreased by 50 to 70%. In Task 3, ADE reports were detected by up to 0.64 F1-score, and CI scored up to 0.96 binary accuracy.</p><p><strong>Conclusion: </strong> Most systems adopt medical-domain-specific pretrained language models using data augmentation methods. Despite the challenge of limited corpus size in Tasks 1 and 2, recent approaches are promising because the partial match scores reached ∼0.8-0.9 F1-scores. Task 3 applications revealed that the different availabilities of external language resources affected the performance per language.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142114054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pardeep Vasudev, Mehran Azimbagirad, Shahab Aslani, Moucheng Xu, Yufei Wang, Robert Chapman, Hannah Coleman, Christopher Werlein, Claire Walsh, Peter Lee, Paul Tafforeau, Joseph Jacob
{"title":"Harnessing Advanced Machine Learning Techniques for Microscopic Vessel Segmentation in Pulmonary Fibrosis Using Novel Hierarchical Phase-Contrast Tomography Images.","authors":"Pardeep Vasudev, Mehran Azimbagirad, Shahab Aslani, Moucheng Xu, Yufei Wang, Robert Chapman, Hannah Coleman, Christopher Werlein, Claire Walsh, Peter Lee, Paul Tafforeau, Joseph Jacob","doi":"10.1055/a-2540-8166","DOIUrl":"10.1055/a-2540-8166","url":null,"abstract":"<p><strong>Background: </strong> Fibrotic lung disease is a progressive illness that causes scarring and ultimately respiratory failure, with irreversible damage by the time it is diagnosed on computed tomography imaging. Recent research postulates the role of the lung vasculature on the pathogenesis of the disease. With the recent development of high-resolution hierarchical phase-contrast tomography (HiP-CT), we have the potential to understand and detect changes in the lungs long before conventional imaging. However, to gain quantitative insight into vascular changes you first need to be able to segment the vessels before further downstream analysis can be conducted. Aside from this, HiP-CT generates large-volume, high-resolution data which is time-consuming and expensive to label.</p><p><strong>Objectives: </strong> This project aims to qualitatively assess the latest machine learning methods for vessel segmentation in HiP-CT data to enable label propagation as the first step for imaging biomarker discovery, with the goal to identify early-stage interstitial lung disease amenable to treatment, before fibrosis begins.</p><p><strong>Methods: </strong> Semisupervised learning (SSL) has become a growing method to tackle sparsely labeled datasets due to its leveraging of unlabeled data. In this study, we will compare two SSL methods; Seg PL, based on pseudo-labeling, and MisMatch, using consistency regularization against state-of-the-art supervised learning method, nnU-Net, on vessel segmentation in sparsely labeled lung HiP-CT data.</p><p><strong>Results: </strong> On initial experimentation, both MisMatch and SegPL showed promising performance on qualitative review. In comparison with supervised learning, both MisMatch and SegPL showed better out-of-distribution performance within the same sample (different vessel morphology and texture vessels), though supervised learning provided more consistent segmentations for well-represented labels in the limited annotations.</p><p><strong>Conclusion: </strong> Further quantitative research is required to better assess the generalizability of these findings, though they show promising first steps toward leveraging this novel data to tackle fibrotic lung disease.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"97-108"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133326/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143450734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sofie Holmeland, Tobias Blomberg, Andreas Mårtensson, Sabine Koch
{"title":"Toward a National Information Model for Medication Orders in Sweden.","authors":"Sofie Holmeland, Tobias Blomberg, Andreas Mårtensson, Sabine Koch","doi":"10.1055/a-2546-4092","DOIUrl":"10.1055/a-2546-4092","url":null,"abstract":"<p><p>Semantic interoperability among health information systems (HISs), in particular electronic health records (EHRs), is crucial for informed healthcare decisions and access to vital health data by the patient. However, inconsistent medication information and limited health data exchange contribute to medication errors worldwide. Although Sweden offers various solutions for health information exchange, there is a limitation in the exchange of medication orders and a lack of understanding the structure of medication orders among EHRs, highlighting the need for further exploration of the structure of medication orders.This study aims to develop a common information model of medication orders for EHRs to be used in the Swedish context.An explorative qualitative design study was conducted. Documents and reference models of how medication orders are structured were collected, and semi-structured interviews were conducted with five purposefully selected participants with insight into how medication orders are structured in EHRs in Sweden. Data were analyzed using information needs analysis, information structure analysis, and code systems, classifications, and terminology analysis.The following information areas were identified for a medication order: medication, medication indication, way of administration, medication order details, and dosage. These information areas were conceptualized into a Unified Modeling Language Class Diagram information model with defined classes, attributes, and data types. The resulting information model provides a representation of how medication orders are depicted in EHRs in Sweden and is aligned with existing national information models such as the National Medication List, while still providing additional information related to medication order details.The developed information model could potentially provide a national standardized model for medication orders, contributing to enhanced semantic interoperability and improving data exchange across various HISs. This could enhance data consistency, reducing the risk of medication errors and thereby improving patient safety.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"109-121"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133328/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexa Iancu, Johannes Bauer, Matthias S May, Hans-Ulrich Prokosch, Arnd Dörfler, Michael Uder, Lorenz A Kapsner
{"title":"Large-Scale Integration of DICOM Metadata into HL7-FHIR for Medical Research.","authors":"Alexa Iancu, Johannes Bauer, Matthias S May, Hans-Ulrich Prokosch, Arnd Dörfler, Michael Uder, Lorenz A Kapsner","doi":"10.1055/a-2521-4250","DOIUrl":"10.1055/a-2521-4250","url":null,"abstract":"<p><strong>Background: </strong> The current gap between the availability of routine imaging data and its provisioning for medical research hinders the utilization of radiological information for secondary purposes. To address this, the German Medical Informatics Initiative (MII) has established frameworks for harmonizing and integrating clinical data across institutions, including the integration of imaging data into research repositories, which can be expanded to routine imaging data.</p><p><strong>Objectives: </strong> This project aims to address this gap by developing a large-scale data processing pipeline to extract, convert, and pseudonymize DICOM (Digital Imaging and Communications in Medicine) metadata into \"ImagingStudy\" Fast Healthcare Interoperability Resources (FHIR) and integrate them into research repositories for secondary use.</p><p><strong>Methods: </strong> The data processing pipeline was developed, implemented, and tested at the Data Integration Center of the University Hospital Erlangen. It leverages existing open-source solutions and integrates seamlessly into the hospital's research IT infrastructure. The pipeline automates the extraction, conversion, and pseudonymization processes, ensuring compliance with both local and MII data protection standards. A large-scale evaluation was conducted using the imaging studies acquired by two departments at University Hospital Erlangen within 1 year. Attributes such as modality, examined body region, laterality, and the number of series and instances were analyzed to assess the quality and availability of the metadata.</p><p><strong>Results: </strong> Once established, the pipeline processed a substantial dataset comprising over 150,000 DICOM studies within an operational period of 26 days. Data analysis revealed significant heterogeneity and incompleteness in certain attributes, particularly the DICOM tag \"Body Part Examined.\" Despite these challenges, the pipeline successfully generated valid and standardized FHIR, providing a robust basis for future research.</p><p><strong>Conclusion: </strong> We demonstrated the setup and test of a large-scale end-to-end data processing pipeline that transforms DICOM imaging metadata directly from clinical routine into the Health Level 7-FHIR format, pseudonymizes the resources, and stores them in an FHIR server. We showcased that the derived FHIRs offer numerous research opportunities, for example, feasibility assessments within Bavarian and Germany-wide research infrastructures. Insights from this study highlight the need to extend the \"ImagingStudy\" FHIR with additional attributes and refine their use within the German MII.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"77-84"},"PeriodicalIF":1.3,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133321/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143993767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}