Sathvik Guru Rao, Pranitha Rokkam, Bide Zhang, Astghik Sargsyan, Abish Kaladharan, Priya Sethumadhavan, Marc Jacobs, Martin Hofmann-Apitius, Alpha Tom Kodamullil
{"title":"Three-layered semantic framework for public health intelligence.","authors":"Sathvik Guru Rao, Pranitha Rokkam, Bide Zhang, Astghik Sargsyan, Abish Kaladharan, Priya Sethumadhavan, Marc Jacobs, Martin Hofmann-Apitius, Alpha Tom Kodamullil","doi":"10.1186/s13326-025-00338-1","DOIUrl":"10.1186/s13326-025-00338-1","url":null,"abstract":"<p><strong>Background: </strong>Disease surveillance systems play a crucial role in monitoring and preventing infectious diseases. However, the current landscape, primarily focused on fragmented health data, poses challenges to contextual understanding and decision-making. This paper addresses this issue by proposing a semantic framework using ontologies to provide a unified data representation for seamless integration. The paper demonstrates the effectiveness of this approach using a case study of a COVID-19 incident at a football game in Italy.</p><p><strong>Method: </strong>In this study, we undertook a comprehensive approach to gather and analyze data for the development of ontologies within the realm of pandemic intelligence. Multiple ontologies were meticulously crafted to cater to different domains related to pandemic intelligence, such as healthcare systems, mass gatherings, travel, and diseases. The ontologies were classified into top-level, domain, and application layers. This classification facilitated the development of a three-layered architecture, promoting reusability, and consistency in knowledge representation, and serving as the backbone of our semantic framework.</p><p><strong>Result: </strong>Through the utilization of our semantic framework, we accomplished semantic enrichment of both structured and unstructured data. The integration of data from diverse sources involved mapping to ontology concepts, leading to the creation and storage of RDF triples in the triple store. This process resulted in the construction of linked data, ultimately enhancing the discoverability and accessibility of valuable insights. Furthermore, our anomaly detection algorithm effectively leveraged knowledge graphs extracted from the triple store, employing semantic relationships to discern patterns and anomalies within the data. Notably, this capability was exemplified by the identification of correlations between a football game and a COVID-19 event occurring at the same location and time.</p><p><strong>Conclusion: </strong>The framework showcased its capability to address intricate, multi-domain queries and support diverse levels of detail. Additionally, it demonstrated proficiency in data analysis and visualization, generating graphs that depict patterns and trends; however, challenges related to ontology maintenance, alignment, and mapping must be addressed for the approach's optimal utilization.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"17"},"PeriodicalIF":2.0,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12439389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145069053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adeel Ansari, Marisa Conte, Allen Flynn, Avanti Paturkar
{"title":"A prototype ETL pipeline that uses HL7 FHIR RDF resources when deploying pure functions to enrich knowledge graph patient data.","authors":"Adeel Ansari, Marisa Conte, Allen Flynn, Avanti Paturkar","doi":"10.1186/s13326-025-00335-4","DOIUrl":"https://doi.org/10.1186/s13326-025-00335-4","url":null,"abstract":"<p><strong>Background: </strong>For clinical care and research, knowledge graphs with patient data can be enriched by extracting parameters from a knowledge graph and then using them as inputs to compute new patient features with pure functions. Systematic and transparent methods for enriching knowledge graphs with newly computed patient features are of interest. When enriching the patient data in knowledge graphs this way, existing ontologies and well-known data resource standards can help promote semantic interoperability.</p><p><strong>Results: </strong>We developed and tested a new data processing pipeline for extracting, computing, and returning newly computed results to a large knowledge graph populated with electronic health record and patient survey data. We show that RDF data resource types already specified by Health Level 7's FHIR RDF effort can be programmatically validated and then used by this new data processing pipeline to represent newly derived patient-level features.</p><p><strong>Conclusions: </strong>Knowledge graph technology can be augmented with standards-based semantic data processing pipelines for deploying and tracing the use of pure functions to derive new patient-level features from existing data. Semantic data processing pipelines enable research enterprises to report on new patient-level computations of interest with linked metadata that details the origin and background of every new computation.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"16"},"PeriodicalIF":2.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erik M van Mulligen, Rowan Parry, Johan van der Lei, Jan A Kors
{"title":"Mapping between clinical and preclinical terminologies: eTRANSAFE's Rosetta stone approach.","authors":"Erik M van Mulligen, Rowan Parry, Johan van der Lei, Jan A Kors","doi":"10.1186/s13326-025-00337-2","DOIUrl":"https://doi.org/10.1186/s13326-025-00337-2","url":null,"abstract":"<p><strong>Background: </strong>The eTRANSAFE project developed tools that support translational research. One of the challenges in this project was to combine preclinical and clinical data, which are coded with different terminologies and granularities, and are expressed as single pre-coordinated, clinical concepts and as combinations of preclinical concepts from different terminologies. This study develops and evaluates the Rosetta Stone approach, which maps combinations of preclinical concepts to clinical, pre-coordinated concepts, allowing for different levels of exactness of mappings.</p><p><strong>Methods: </strong>Concepts from preclinical and clinical terminologies used in eTRANSAFE have been mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). SNOMED CT acts as an intermediary terminology that provides the semantics to bridge between pre-coordinated clinical concepts and combinations of preclinical concepts with different levels of granularity. The mappings from clinical terminologies to SNOMED CT were taken from existing resources, while mappings from the preclinical terminologies to SNOMED CT were manually created. A coordination template defines the relation types that can be explored for a mapping and assigns a penalty score that reflects the inexactness of the mapping. A subset of 60 pre-coordinated concepts was mapped both with the Rosetta Stone semantic approach and with a lexical term matching approach. Both results were manually evaluated.</p><p><strong>Results: </strong>A total of 34,308 concepts from preclinical terminologies (Histopathology terminology, Standard for Exchange of Nonclinical Data (SEND) code lists, Mouse Adult Gross Anatomy Ontology) and a clinical terminology (MedDRA) were mapped to SNOMED CT as the intermediary bridging terminology. A terminology service has been developed that returns dynamically the exact and inexact mappings between preclinical and clinical concepts. On the evaluation set, the precision of the mappings from the terminology service was high (95%), much higher than for lexical term matching (22%).</p><p><strong>Conclusion: </strong>The Rosetta Stone approach uses a semantically rich intermediate terminology to map between pre-coordinated clinical concepts and a combination of preclinical concepts with different levels of exactness. The possibility to generate not only exact but also inexact mappings allows to relate larger amounts of preclinical and clinical data, which can be helpful in translational use cases.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"15"},"PeriodicalIF":2.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12372267/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BASIL DB: bioactive semantic integration and linking database.","authors":"David Jackson, Paul Groth, Hazar Harmouch","doi":"10.1186/s13326-025-00336-3","DOIUrl":"10.1186/s13326-025-00336-3","url":null,"abstract":"<p><strong>Background: </strong>Bioactive compounds found in foods and plants can provide health benefits, including antioxidant and anti-inflammatory effects. Research into their role in disease prevention and personalized nutrition is expanding, but challenges such as data complexity, inconsistent methods, and the rapid growth of scientific literature can hinder progress. To address these issues, we developed BASIL DB (BioActive Semantic Integration and Linking Database), a knowledge graph (KG) database that leverages natural language processing (NLP) techniques to streamline data organization and analysis. This automated approach offers greater scalability and comprehensiveness than traditional methods such as manual data curation and entry.</p><p><strong>Construction and content: </strong>The process of constructing the BASIL DB is divided into four fundamental steps: data collection, data preprocessing, data extraction, and data integration. Data on bioactives and foods are sourced from structured databases. The relevant randomized controlled trials (RCTs) were extracted from PubMed. The data are then prepared by cleaning inconsistencies and structuring them for analysis. In the data extraction phase, NLP tools, including a large language model (LLM), are utilized to analyze clinical trials and extract data on bioactive compounds and their health impacts. The integration phase compiles these data into a knowledge graph, which consists of the entities Foods, Bioactives, and Health Conditions as nodes and their interactions as edges. To quantify the relationships/interactions between these entities, we generate a weight for each edge on the basis of empirical evidence and methodological rigor.</p><p><strong>Utility and discussion: </strong>The BASIL DB incorporates 433 compounds, 40296 research papers, 7256 health effects, and 4197 food items. The database features query and visualization capabilities, including interactive graphs and custom filtering options, that showcase different aspects of the data. Users are able to explore the relationships between bioactives and health effects, enhancing both research efficiency and insight discovery.</p><p><strong>Conclusion: </strong>The BASIL DB is a knowledge graph database of bioactive compounds. This study provides a structured resource for exploring the relationships among bioactives, foods, and health outcomes, representing a step toward a more systematic and data-driven approach to understanding the health effects of bioactive compounds. Future work will focus on expanding the database and refining the utilized methods. Extending the BASIL DB will help bridge the gap between traditional and conventional approaches to nutrition, guiding future research in bioactive compound discovery and health optimization.</p><p><strong>Availability: </strong>Users can access and explore the data via https://basil-db.github.io/info.html or fork and run the respective script via https://github.com/basil-db/scr","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"14"},"PeriodicalIF":2.0,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12351831/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144846598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raniah Nur Hanami, Rahmad Mahendra, Alfan Farizki Wicaksono
{"title":"Semantic classification of Indonesian consumer health questions.","authors":"Raniah Nur Hanami, Rahmad Mahendra, Alfan Farizki Wicaksono","doi":"10.1186/s13326-025-00334-5","DOIUrl":"10.1186/s13326-025-00334-5","url":null,"abstract":"<p><strong>Purpose: </strong>Online consumer health forums serve as a way for the public to connect with medical professionals. While these medical forums offer a valuable service, online Question Answering (QA) forums can struggle to deliver timely answers due to the limited number of available healthcare professionals. One way to solve this problem is by developing an automatic QA system that can provide patients with quicker answers. One key component of such a system could be a module for classifying the semantic type of a question. This would allow the system to understand the patient's intent and route them towards the relevant information.</p><p><strong>Methods: </strong>This paper proposes a novel two-step approach to address the challenge of semantic type classification in Indonesian consumer health questions. We acknowledge the scarcity of Indonesian health domain data, a hurdle for machine learning models. To address this gap, we first introduce a novel corpus of annotated Indonesian consumer health questions. Second, we utilize this newly created corpus to build and evaluate a data-driven predictive model for classifying question semantic types. To enhance the trustworthiness and interpretability of the model's predictions, we employ an explainable model framework, LIME. This framework facilitates a deeper understanding of the role played by word-based features in the model's decision-making process. Additionally, it empowers us to conduct a comprehensive bias analysis, allowing for the detection of \"semantic bias\", where words with no inherent association with a specific semantic type disproportionately influence the model's predictions.</p><p><strong>Results: </strong>The annotation process revealed moderate agreement between expert annotators. In addition, not all words with high LIME probability could be considered true characteristics of a question type. This suggests a potential bias in the data used and the machine learning models themselves. Notably, XGBoost, Naïve Bayes, and MLP models exhibited a tendency to predict questions containing the words \"kanker\" (cancer) and \"depresi\" (depression) as belonging to the DIAGNOSIS category. In terms of prediction performance, Perceptron and XGBoost emerged as the top-performing models, achieving the highest weighted average F1 scores across all input scenarios and weighting factors. Naïve Bayes performed best after balancing the data with Borderline SMOTE, indicating its promise for handling imbalanced datasets.</p><p><strong>Conclusion: </strong>We constructed a corpus of query semantics in the domain of Indonesian consumer health, containing 964 questions annotated with their corresponding semantic types. This corpus served as the foundation for building a predictive model. We further investigated the impact of disease-biased words on model performance. These words exhibited high LIME scores, yet lacked association with a specific semantic type. We trained models using datasets with ","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"13"},"PeriodicalIF":2.0,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12302743/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144731118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Beverley, Shane Babcock, Carter Benson, Giacomo De Colle, Sydney Cohen, Alexander D Diehl, Ram A N R Challa, Rachel A Mavrovich, Joshua Billig, Anthony Huffman, Yongqun He
{"title":"A fourfold pathogen reference ontology suite.","authors":"John Beverley, Shane Babcock, Carter Benson, Giacomo De Colle, Sydney Cohen, Alexander D Diehl, Ram A N R Challa, Rachel A Mavrovich, Joshua Billig, Anthony Huffman, Yongqun He","doi":"10.1186/s13326-025-00333-6","DOIUrl":"10.1186/s13326-025-00333-6","url":null,"abstract":"<p><strong>Background: </strong>Infectious diseases remain a critical global health challenge, and the integration of standardized ontologies plays a vital role in managing related data. The Infectious Disease Ontology (IDO) and its extensions, such as the Coronavirus Infectious Disease Ontology (CIDO), are essential for organizing and disseminating information related to infectious diseases. The COVID-19 pandemic highlighted the need for updating IDO and its virus-specific extensions. There is an additional need to update IDO extensions specific to bacteria, fungus, and parasite infectious diseases.</p><p><strong>Methods: </strong>The \"hub-and-spoke\" methodology is adopted to generate pathogen-specific extensions of IDO: Virus Infectious Disease Ontology (VIDO), Bacteria Infectious Disease Ontology (BIDO), Mycosis Infectious Disease Ontology (MIDO), and Parasite Infectious Disease Ontology (PIDO).</p><p><strong>Results: </strong>IDO is introduced before reporting on the scopes, major classes and relations, applications and extensions of IDO to VIDO, BIDO, MIDO, and PIDO.</p><p><strong>Conclusions: </strong>The creation of pathogen-specific reference ontologies advances modularization and reusability of infectious disease ontologies within the IDO ecosystem. Future work will focus on further refining these ontologies, creating new extensions, and developing application ontologies based on them, in line with ongoing efforts to standardize biological and biomedical terminologies for improved data sharing, quality, and analysis.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"12"},"PeriodicalIF":2.0,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12239493/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144600470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lizzy Farrugia, Lilian M Azzopardi, Jeremy Debattista, Charlie Abela
{"title":"medicX-KG: a knowledge graph for pharmacists' drug information needs.","authors":"Lizzy Farrugia, Lilian M Azzopardi, Jeremy Debattista, Charlie Abela","doi":"10.1186/s13326-025-00332-7","DOIUrl":"10.1186/s13326-025-00332-7","url":null,"abstract":"<p><p>The role of pharmacists is evolving from medicine dispensing to delivering comprehensive pharmaceutical services within multidisciplinary healthcare teams. Central to this shift is access to accurate, up-to-date medicinal product information supported by robust data integration. Leveraging artificial intelligence and semantic technologies, Knowledge Graphs (KGs) uncover hidden relationships and enable data-driven decision-making. This paper presents medicX-KG, a pharmacist-oriented knowledge graph supporting clinical and regulatory decisions. It forms the semantic layer of the broader medicX platform, powering predictive and explainable pharmacy services. medicX-KG integrates data from three sources, including, the British National Formulary (BNF), DrugBank, and the Malta Medicines Authority (MMA) that addresses Malta's regulatory landscape and combines European Medicines Agency alignment with partial UK supply dependence. The KG tackles the absence of a unified national drug repository, reducing pharmacists' reliance on fragmented sources. Its design was informed by interviews with practising pharmacists to ensure real-world applicability. We detail the KG's construction, including data extraction, ontology design, and semantic mapping. Evaluation demonstrates that medicX-KG effectively supports queries about drug availability, interactions, adverse reactions, and therapeutic classes. Limitations, including missing detailed dosage encoding and real-time updates, are discussed alongside directions for future enhancements.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"11"},"PeriodicalIF":2.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12211240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144540334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unveiling differential adverse event profiles in vaccines via LLM text embeddings and ontology semantic analysis.","authors":"Zhigang Wang, Xingxian Li, Jie Zheng, Yongqun He","doi":"10.1186/s13326-025-00331-8","DOIUrl":"10.1186/s13326-025-00331-8","url":null,"abstract":"<p><strong>Background: </strong>Vaccines are crucial for preventing infectious diseases; however, they may also be associated with adverse events (AEs). Conventional analysis of vaccine AEs relies on manual review and assignment of AEs to terms in terminology or ontology, which is a time-consuming process and constrained in scope. This study explores the potential of using Large Language Models (LLMs) and LLM text embeddings for efficient and comprehensive vaccine AE analysis.</p><p><strong>Results: </strong>We used Llama-3 LLM to extract AE information from FDA-approved vaccine package inserts for 111 licensed vaccines, including 15 influenza vaccines. Text embeddings were then generated for each vaccine's AEs using the nomic-embed-text and mxbai-embed-large models. Llama-3 achieved over 80% accuracy in extracting AE text from vaccine package inserts. To further evaluate the performance of text embedding, the vaccines were clustered using two clustering methods: (1) LLM text embedding-based clustering and (2) ontology-based semantic similarity analysis. The ontology-based method mapped AEs to the Human Phenotype Ontology (HPO) and Ontology of Adverse Events (OAE), with semantic similarity analyzed using Lin's method. Text embeddings were generated for each vaccine's AE description using the LLM nomic-embed-text and mxbai-embed-large models. Compared to the semantic similarity analysis, the LLM approach was able to capture more differential AE profiles. Furthermore, LLM-derived text embeddings were used to develop a Lasso logistic regression model to predict whether a vaccine is \"Live\" or \"Non-Live\". The term \"Non-Live\" refers to all vaccines that do not contain live organisms, including inactivated and mRNA vaccines. A comparative analysis showed that, despite similar clustering patterns, the nomic-embed-text model outperformed the other. It achieved 80.00% sensitivity, 83.06% specificity, and 81.89% accuracy in a 10-fold cross-validation. Many AE patterns, with examples demonstrated, were identified from our analysis with AE LLM embeddings.</p><p><strong>Conclusion: </strong>This study demonstrates the effectiveness of LLMs for automated AE extraction and analysis, and LLM text embeddings capture latent information about AEs, enabling more comprehensive knowledge discovery. Our findings suggest that LLMs demonstrate substantial potential for improving vaccine safety and public health research.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"10"},"PeriodicalIF":2.0,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12102970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144132354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vasundra Touré, Deepak Unni, Philip Krauss, Abdelhamid Abdelwahed, Jascha Buchhorn, Leon Hinderling, Thomas R Geiger, Sabine Österle
{"title":"The SPHN Schema Forge - transform healthcare semantics from human-readable to machine-readable by leveraging semantic web technologies.","authors":"Vasundra Touré, Deepak Unni, Philip Krauss, Abdelhamid Abdelwahed, Jascha Buchhorn, Leon Hinderling, Thomas R Geiger, Sabine Österle","doi":"10.1186/s13326-025-00330-9","DOIUrl":"10.1186/s13326-025-00330-9","url":null,"abstract":"<p><strong>Background: </strong>The Swiss Personalized Health Network (SPHN) adopted the Resource Description Framework (RDF), a core component of the Semantic Web technology stack, for the formal encoding and exchange of healthcare data in a medical knowledge graph. The SPHN RDF Schema defines the semantics on how data elements should be represented. While RDF is proven to be machine readable and interpretable, it can be challenging for individuals without specialized background to read and understand the knowledge represented in RDF. For this reason, the semantics described in the SPHN RDF Schema are primarily defined in a user-accessible tabular format, the SPHN Dataset, before being translated into its RDF representation. However, this translation process was previously manual, time-consuming and labor-intensive.</p><p><strong>Result: </strong>To automate and streamline the translation from tabular to RDF representation, the SPHN Schema Forge web service was developed. With a few clicks, this tool automatically converts an SPHN-compliant Dataset spreadsheet into an RDF schema. Additionally, it generates SHACL rules for data validation, an HTML visualization of the schema and SPARQL queries for basic data analysis.</p><p><strong>Conclusion: </strong>The SPHN Schema Forge significantly reduces the manual effort and time required for schema generation, enabling researchers to focus on more meaningful tasks such as data interpretation and analysis within the SPHN framework.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"9"},"PeriodicalIF":2.0,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12063216/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144005244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning.","authors":"Tsaqif Naufal, Rahmad Mahendra, Alfan Farizki Wicaksono","doi":"10.1186/s13326-025-00329-2","DOIUrl":"10.1186/s13326-025-00329-2","url":null,"abstract":"<p><strong>Purpose: </strong>Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information.</p><p><strong>Methods: </strong>This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures.</p><p><strong>Results: </strong>Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were <math><mrow><mn>88.61</mn> <mo>%</mo> <mo>,</mo> <mn>64.83</mn> <mo>%</mo></mrow> </math> , and <math><mrow><mn>35.01</mn> <mo>%</mo></mrow> </math> respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with <math><msub><mtext>IndoNLU</mtext> <mtext>LARGE</mtext></msub> </math> obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task.</p><p><strong>Conclusion: </strong>We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional informat","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"8"},"PeriodicalIF":2.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12057135/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144025207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}