Journal of Biomedical Semantics最新文献_第2页

Annotating and indexing scientific articles with rare diseases. 对罕见病的科学文章进行注释和索引。

IF 2 3区工程技术

Journal of Biomedical Semantics Pub Date : 2026-01-06 DOI: 10.1186/s13326-025-00346-1

Hosein Azarbonyad, Zubair Afzal, Rik Iping, Max Dumoulin, Ilse Nederveen, Jiangtao Yu, Georgios Tsatsaronis

{"title":"Annotating and indexing scientific articles with rare diseases.","authors":"Hosein Azarbonyad, Zubair Afzal, Rik Iping, Max Dumoulin, Ilse Nederveen, Jiangtao Yu, Georgios Tsatsaronis","doi":"10.1186/s13326-025-00346-1","DOIUrl":"10.1186/s13326-025-00346-1","url":null,"abstract":"Background: Around 30 million people in Europe are affected by a rare (or orphan) disease, defined as a condition occurring in fewer than 1 in 2,000 individuals. The primary challenge is to automatically and efficiently identify scientific articles and guidelines that address a particular rare disease. We present a novel methodology to annotate and index scientific text with taxonomical concepts describing rare diseases from the OrphaNet taxonomy. This task is complicated by several technical challenges, including the lack of sufficiently large, human-annotated datasets for supervised training and the polysemy/synonymy and surface-form variation of rare disease names, which can hinder any annotation engine.Results: We introduce a framework that operationalizes OrphaNet for large-scale literature annotation by integrating the TERMite engine with curated synonym expansion, label normalization (including deprecated/renamed concepts), and fuzzy matching. On benchmark datasets, the approach achieves precision = 92%, recall = 75%, and F1 = 83%, outperforming an string-matching baseline. Applying the pipeline to Scopus produces disease-specific corpora suitable for bibliometric and scientometric analyses (e.g., institution, country, and subject-area profiles). These outputs power the Rare Diseases Monitor dashboard for exploring national and global research activity.Conclusion: To our knowledge, this is the first systematic, scalable semantic framework for annotating and indexing rare disease literature at scale. By operationalizing OrphaNet in an automated, reproducible pipeline and addressing data scarcity and lexical variability, the work advances biomedical semantics for rare diseases and enables disease-centric monitoring, evaluation, and discovery across the research landscape.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"3"},"PeriodicalIF":2.0,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145911488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SimSUM - simulated benchmark with structured and unstructured medical records. SimSUM -具有结构化和非结构化病历的模拟基准。

IF 2 3区工程技术

Journal of Biomedical Semantics Pub Date : 2025-12-18 DOI: 10.1186/s13326-025-00341-6

Paloma Rabaey, Stefan Heytens, Thomas Demeester

引用次数: 0

BabelFSH-a toolkit for an effective HL7 FHIR-based terminology provision. babelfsh—用于有效的基于HL7 fhr的术语提供的工具包。

IF 2 3区工程技术

Journal of Biomedical Semantics Pub Date : 2025-11-29 DOI: 10.1186/s13326-025-00343-4

Joshua Wiedekopf, Tessa Ohlsen, Ann-Kristin Kock-Schoppenhauer, Josef Ingenerf

{"title":"BabelFSH-a toolkit for an effective HL7 FHIR-based terminology provision.","authors":"Joshua Wiedekopf, Tessa Ohlsen, Ann-Kristin Kock-Schoppenhauer, Josef Ingenerf","doi":"10.1186/s13326-025-00343-4","DOIUrl":"10.1186/s13326-025-00343-4","url":null,"abstract":"Background: HL7 FHIR terminological services (TS) are a valuable tool towards better healthcare interoperability, but require representations of terminologies using FHIR resources to provide their services. As most terminologies are not natively distributed using FHIR resources, converters are needed. Large-scale FHIR projects, especially those with a national or even an international scope, define enormous numbers of value sets and reference many large and complex code systems, which must be regularly updated in TS and other systems. This necessitates a flexible, scalable and efficient provision of these artifacts. This work aims to develop a comprehensive, extensible and accessible toolkit for FHIR terminology conversion, making it possible for terminology authors, FHIR profilers and other actors to provide standardized TS for large-scale terminological artifacts.Implementation: Based on the prevalent HL7 FHIR Shorthand (FSH) specification, a converter toolkit, called BabelFSH, was created that utilizes an adaptable plugin architecture to separate the definition of content from that of the needed declarative metadata. The development process was guided by formalized design goals.Results: All eight design goals were addressed by BabelFSH. Validation of the systems' performance and completeness was exemplarily demonstrated using Alpha-ID-SE, an important terminology used for diagnosis coding especially of rare diseases within Germany. The tool is now used extensively within the content delivery pipeline for a central FHIR TS with a national scope within the German Medical Informatics Initiative and Network University Medicine and demonstrates adequate usability for FHIR developers.Discussion: The first development focus was geared towards the requirements of the central research FHIR TS for the federated FHIR infrastructure in Germany, and has proven to be very useful towards that goal. Opportunities for further improvement were identified in the validation process especially, as the validation messages are currently imprecise at times. The design of the application lends itself to the implementation of further use cases, such as direct connectivity to legacy systems for catalog conversion to FHIR.Conclusions: The developed BabelFSH tool is a novel, powerful and open-source approach to making heterogenous sources of terminological knowledge accessible as FHIR resources, thus aiding semantic interoperability in healthcare in general.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"19"},"PeriodicalIF":2.0,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12679771/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145633792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The CLEAR Principle: organizing data and metadata into semantically meaningful types of FAIR Digital Objects to increase their human explorability and cognitive interoperability. CLEAR原则：将数据和元数据组织成语义上有意义的FAIR数字对象类型，以增加其人类可探索性和认知互操作性。

IF 2 3区工程技术

Journal of Biomedical Semantics Pub Date : 2025-10-28 DOI: 10.1186/s13326-025-00340-7

Lars Vogt

{"title":"The CLEAR Principle: organizing data and metadata into semantically meaningful types of FAIR Digital Objects to increase their human explorability and cognitive interoperability.","authors":"Lars Vogt","doi":"10.1186/s13326-025-00340-7","DOIUrl":"10.1186/s13326-025-00340-7","url":null,"abstract":"Background: Ensuring the FAIRness (Findable, Accessible, Interoperable, Reusable) of data and metadata is an important goal in both research and industry. Knowledge graphs and ontologies have been central in achieving this goal, with interoperability of data and metadata receiving much attention. This paper argues that the emphasis on machine-actionability has overshadowed the essential need for human-actionability of data and metadata, and provides three examples that describe the lack of human-actionability within knowledge graphs.Results: The paper propagates the incorporation of cognitive interoperability as another vital layer within the European Open Science Cloud Interoperability Framework and discusses the relation between human explorability of data and metadata and their cognitive interoperability. It suggests adding the CLEAR Principle to support the cognitive interoperability and human contextual explorability of data and metadata. The subsequent sections present the concept of semantic units, elucidating their important role in attaining CLEAR. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs, each represented with its own resource that constitutes a FAIR Digital Object (FDO) and that instantiates a corresponding FDO class. Various categories of FDOs are distinguished. Each semantic unit can be displayed in a user interface either as a mind-map-like graph or as natural language text.Conclusions: Semantic units organize knowledge graphs into levels of representational granularity, distinct granularity trees, and diverse frames of reference. This organization supports the cognitive interoperability of data and metadata and facilitates their contextual explorability by humans. The development of innovative user interfaces enabled by FDOs that are based on semantic units would empower users to access, navigate, and explore information in CLEAR knowledge graphs with optimized efficiency.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"18"},"PeriodicalIF":2.0,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12570660/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145389754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Three-layered semantic framework for public health intelligence. 公共卫生情报的三层语义框架。

IF 2 3区工程技术

Journal of Biomedical Semantics Pub Date : 2025-09-15 DOI: 10.1186/s13326-025-00338-1

Sathvik Guru Rao, Pranitha Rokkam, Bide Zhang, Astghik Sargsyan, Abish Kaladharan, Priya Sethumadhavan, Marc Jacobs, Martin Hofmann-Apitius, Alpha Tom Kodamullil

{"title":"Three-layered semantic framework for public health intelligence.","authors":"Sathvik Guru Rao, Pranitha Rokkam, Bide Zhang, Astghik Sargsyan, Abish Kaladharan, Priya Sethumadhavan, Marc Jacobs, Martin Hofmann-Apitius, Alpha Tom Kodamullil","doi":"10.1186/s13326-025-00338-1","DOIUrl":"10.1186/s13326-025-00338-1","url":null,"abstract":"Background: Disease surveillance systems play a crucial role in monitoring and preventing infectious diseases. However, the current landscape, primarily focused on fragmented health data, poses challenges to contextual understanding and decision-making. This paper addresses this issue by proposing a semantic framework using ontologies to provide a unified data representation for seamless integration. The paper demonstrates the effectiveness of this approach using a case study of a COVID-19 incident at a football game in Italy.Method: In this study, we undertook a comprehensive approach to gather and analyze data for the development of ontologies within the realm of pandemic intelligence. Multiple ontologies were meticulously crafted to cater to different domains related to pandemic intelligence, such as healthcare systems, mass gatherings, travel, and diseases. The ontologies were classified into top-level, domain, and application layers. This classification facilitated the development of a three-layered architecture, promoting reusability, and consistency in knowledge representation, and serving as the backbone of our semantic framework.Result: Through the utilization of our semantic framework, we accomplished semantic enrichment of both structured and unstructured data. The integration of data from diverse sources involved mapping to ontology concepts, leading to the creation and storage of RDF triples in the triple store. This process resulted in the construction of linked data, ultimately enhancing the discoverability and accessibility of valuable insights. Furthermore, our anomaly detection algorithm effectively leveraged knowledge graphs extracted from the triple store, employing semantic relationships to discern patterns and anomalies within the data. Notably, this capability was exemplified by the identification of correlations between a football game and a COVID-19 event occurring at the same location and time.Conclusion: The framework showcased its capability to address intricate, multi-domain queries and support diverse levels of detail. Additionally, it demonstrated proficiency in data analysis and visualization, generating graphs that depict patterns and trends; however, challenges related to ontology maintenance, alignment, and mapping must be addressed for the approach's optimal utilization.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"17"},"PeriodicalIF":2.0,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12439389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145069053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A prototype ETL pipeline that uses HL7 FHIR RDF resources when deploying pure functions to enrich knowledge graph patient data. 一个原型ETL管道，在部署纯函数以丰富知识图谱患者数据时使用HL7 FHIR RDF资源。

IF 2 3区工程技术

Journal of Biomedical Semantics Pub Date : 2025-09-01 DOI: 10.1186/s13326-025-00335-4

Adeel Ansari, Marisa Conte, Allen Flynn, Avanti Paturkar

{"title":"A prototype ETL pipeline that uses HL7 FHIR RDF resources when deploying pure functions to enrich knowledge graph patient data.","authors":"Adeel Ansari, Marisa Conte, Allen Flynn, Avanti Paturkar","doi":"10.1186/s13326-025-00335-4","DOIUrl":"https://doi.org/10.1186/s13326-025-00335-4","url":null,"abstract":"Background: For clinical care and research, knowledge graphs with patient data can be enriched by extracting parameters from a knowledge graph and then using them as inputs to compute new patient features with pure functions. Systematic and transparent methods for enriching knowledge graphs with newly computed patient features are of interest. When enriching the patient data in knowledge graphs this way, existing ontologies and well-known data resource standards can help promote semantic interoperability.Results: We developed and tested a new data processing pipeline for extracting, computing, and returning newly computed results to a large knowledge graph populated with electronic health record and patient survey data. We show that RDF data resource types already specified by Health Level 7's FHIR RDF effort can be programmatically validated and then used by this new data processing pipeline to represent newly derived patient-level features.Conclusions: Knowledge graph technology can be augmented with standards-based semantic data processing pipelines for deploying and tracing the use of pure functions to derive new patient-level features from existing data. Semantic data processing pipelines enable research enterprises to report on new patient-level computations of interest with linked metadata that details the origin and background of every new computation.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"16"},"PeriodicalIF":2.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mapping between clinical and preclinical terminologies: eTRANSAFE's Rosetta stone approach. 临床和临床前术语之间的映射：eTRANSAFE的罗塞塔石碑方法。

IF 2 3区工程技术

Journal of Biomedical Semantics Pub Date : 2025-08-21 DOI: 10.1186/s13326-025-00337-2

Erik M van Mulligen, Rowan Parry, Johan van der Lei, Jan A Kors

{"title":"Mapping between clinical and preclinical terminologies: eTRANSAFE's Rosetta stone approach.","authors":"Erik M van Mulligen, Rowan Parry, Johan van der Lei, Jan A Kors","doi":"10.1186/s13326-025-00337-2","DOIUrl":"https://doi.org/10.1186/s13326-025-00337-2","url":null,"abstract":"Background: The eTRANSAFE project developed tools that support translational research. One of the challenges in this project was to combine preclinical and clinical data, which are coded with different terminologies and granularities, and are expressed as single pre-coordinated, clinical concepts and as combinations of preclinical concepts from different terminologies. This study develops and evaluates the Rosetta Stone approach, which maps combinations of preclinical concepts to clinical, pre-coordinated concepts, allowing for different levels of exactness of mappings.Methods: Concepts from preclinical and clinical terminologies used in eTRANSAFE have been mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). SNOMED CT acts as an intermediary terminology that provides the semantics to bridge between pre-coordinated clinical concepts and combinations of preclinical concepts with different levels of granularity. The mappings from clinical terminologies to SNOMED CT were taken from existing resources, while mappings from the preclinical terminologies to SNOMED CT were manually created. A coordination template defines the relation types that can be explored for a mapping and assigns a penalty score that reflects the inexactness of the mapping. A subset of 60 pre-coordinated concepts was mapped both with the Rosetta Stone semantic approach and with a lexical term matching approach. Both results were manually evaluated.Results: A total of 34,308 concepts from preclinical terminologies (Histopathology terminology, Standard for Exchange of Nonclinical Data (SEND) code lists, Mouse Adult Gross Anatomy Ontology) and a clinical terminology (MedDRA) were mapped to SNOMED CT as the intermediary bridging terminology. A terminology service has been developed that returns dynamically the exact and inexact mappings between preclinical and clinical concepts. On the evaluation set, the precision of the mappings from the terminology service was high (95%), much higher than for lexical term matching (22%).Conclusion: The Rosetta Stone approach uses a semantically rich intermediate terminology to map between pre-coordinated clinical concepts and a combination of preclinical concepts with different levels of exactness. The possibility to generate not only exact but also inexact mappings allows to relate larger amounts of preclinical and clinical data, which can be helpful in translational use cases.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"15"},"PeriodicalIF":2.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12372267/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BASIL DB: bioactive semantic integration and linking database. BASIL DB：生物活性语义整合和链接数据库。

IF 2 3区工程技术

Journal of Biomedical Semantics Pub Date : 2025-08-13 DOI: 10.1186/s13326-025-00336-3

David Jackson, Paul Groth, Hazar Harmouch

{"title":"BASIL DB: bioactive semantic integration and linking database.","authors":"David Jackson, Paul Groth, Hazar Harmouch","doi":"10.1186/s13326-025-00336-3","DOIUrl":"10.1186/s13326-025-00336-3","url":null,"abstract":"Background: Bioactive compounds found in foods and plants can provide health benefits, including antioxidant and anti-inflammatory effects. Research into their role in disease prevention and personalized nutrition is expanding, but challenges such as data complexity, inconsistent methods, and the rapid growth of scientific literature can hinder progress. To address these issues, we developed BASIL DB (BioActive Semantic Integration and Linking Database), a knowledge graph (KG) database that leverages natural language processing (NLP) techniques to streamline data organization and analysis. This automated approach offers greater scalability and comprehensiveness than traditional methods such as manual data curation and entry.Construction and content: The process of constructing the BASIL DB is divided into four fundamental steps: data collection, data preprocessing, data extraction, and data integration. Data on bioactives and foods are sourced from structured databases. The relevant randomized controlled trials (RCTs) were extracted from PubMed. The data are then prepared by cleaning inconsistencies and structuring them for analysis. In the data extraction phase, NLP tools, including a large language model (LLM), are utilized to analyze clinical trials and extract data on bioactive compounds and their health impacts. The integration phase compiles these data into a knowledge graph, which consists of the entities Foods, Bioactives, and Health Conditions as nodes and their interactions as edges. To quantify the relationships/interactions between these entities, we generate a weight for each edge on the basis of empirical evidence and methodological rigor.Utility and discussion: The BASIL DB incorporates 433 compounds, 40296 research papers, 7256 health effects, and 4197 food items. The database features query and visualization capabilities, including interactive graphs and custom filtering options, that showcase different aspects of the data. Users are able to explore the relationships between bioactives and health effects, enhancing both research efficiency and insight discovery.Conclusion: The BASIL DB is a knowledge graph database of bioactive compounds. This study provides a structured resource for exploring the relationships among bioactives, foods, and health outcomes, representing a step toward a more systematic and data-driven approach to understanding the health effects of bioactive compounds. Future work will focus on expanding the database and refining the utilized methods. Extending the BASIL DB will help bridge the gap between traditional and conventional approaches to nutrition, guiding future research in bioactive compound discovery and health optimization.Availability: Users can access and explore the data via https://basil-db.github.io/info.html or fork and run the respective script via https://github.com/basil-db/scr","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"14"},"PeriodicalIF":2.0,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12351831/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144846598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semantic classification of Indonesian consumer health questions. 印度尼西亚消费者健康问题的语义分类。

IF 2 3区工程技术

Journal of Biomedical Semantics Pub Date : 2025-07-28 DOI: 10.1186/s13326-025-00334-5

Raniah Nur Hanami, Rahmad Mahendra, Alfan Farizki Wicaksono

{"title":"Semantic classification of Indonesian consumer health questions.","authors":"Raniah Nur Hanami, Rahmad Mahendra, Alfan Farizki Wicaksono","doi":"10.1186/s13326-025-00334-5","DOIUrl":"10.1186/s13326-025-00334-5","url":null,"abstract":"Purpose: Online consumer health forums serve as a way for the public to connect with medical professionals. While these medical forums offer a valuable service, online Question Answering (QA) forums can struggle to deliver timely answers due to the limited number of available healthcare professionals. One way to solve this problem is by developing an automatic QA system that can provide patients with quicker answers. One key component of such a system could be a module for classifying the semantic type of a question. This would allow the system to understand the patient's intent and route them towards the relevant information.Methods: This paper proposes a novel two-step approach to address the challenge of semantic type classification in Indonesian consumer health questions. We acknowledge the scarcity of Indonesian health domain data, a hurdle for machine learning models. To address this gap, we first introduce a novel corpus of annotated Indonesian consumer health questions. Second, we utilize this newly created corpus to build and evaluate a data-driven predictive model for classifying question semantic types. To enhance the trustworthiness and interpretability of the model's predictions, we employ an explainable model framework, LIME. This framework facilitates a deeper understanding of the role played by word-based features in the model's decision-making process. Additionally, it empowers us to conduct a comprehensive bias analysis, allowing for the detection of \"semantic bias\", where words with no inherent association with a specific semantic type disproportionately influence the model's predictions.Results: The annotation process revealed moderate agreement between expert annotators. In addition, not all words with high LIME probability could be considered true characteristics of a question type. This suggests a potential bias in the data used and the machine learning models themselves. Notably, XGBoost, Naïve Bayes, and MLP models exhibited a tendency to predict questions containing the words \"kanker\" (cancer) and \"depresi\" (depression) as belonging to the DIAGNOSIS category. In terms of prediction performance, Perceptron and XGBoost emerged as the top-performing models, achieving the highest weighted average F1 scores across all input scenarios and weighting factors. Naïve Bayes performed best after balancing the data with Borderline SMOTE, indicating its promise for handling imbalanced datasets.Conclusion: We constructed a corpus of query semantics in the domain of Indonesian consumer health, containing 964 questions annotated with their corresponding semantic types. This corpus served as the foundation for building a predictive model. We further investigated the impact of disease-biased words on model performance. These words exhibited high LIME scores, yet lacked association with a specific semantic type. We trained models using datasets with ","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"13"},"PeriodicalIF":2.0,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12302743/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144731118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A fourfold pathogen reference ontology suite. 四层病原体参考本体套件。

IF 2 3区工程技术

Journal of Biomedical Semantics Pub Date : 2025-07-09 DOI: 10.1186/s13326-025-00333-6

John Beverley, Shane Babcock, Carter Benson, Giacomo De Colle, Sydney Cohen, Alexander D Diehl, Ram A N R Challa, Rachel A Mavrovich, Joshua Billig, Anthony Huffman, Yongqun He

{"title":"A fourfold pathogen reference ontology suite.","authors":"John Beverley, Shane Babcock, Carter Benson, Giacomo De Colle, Sydney Cohen, Alexander D Diehl, Ram A N R Challa, Rachel A Mavrovich, Joshua Billig, Anthony Huffman, Yongqun He","doi":"10.1186/s13326-025-00333-6","DOIUrl":"10.1186/s13326-025-00333-6","url":null,"abstract":"Background: Infectious diseases remain a critical global health challenge, and the integration of standardized ontologies plays a vital role in managing related data. The Infectious Disease Ontology (IDO) and its extensions, such as the Coronavirus Infectious Disease Ontology (CIDO), are essential for organizing and disseminating information related to infectious diseases. The COVID-19 pandemic highlighted the need for updating IDO and its virus-specific extensions. There is an additional need to update IDO extensions specific to bacteria, fungus, and parasite infectious diseases.Methods: The \"hub-and-spoke\" methodology is adopted to generate pathogen-specific extensions of IDO: Virus Infectious Disease Ontology (VIDO), Bacteria Infectious Disease Ontology (BIDO), Mycosis Infectious Disease Ontology (MIDO), and Parasite Infectious Disease Ontology (PIDO).Results: IDO is introduced before reporting on the scopes, major classes and relations, applications and extensions of IDO to VIDO, BIDO, MIDO, and PIDO.Conclusions: The creation of pathogen-specific reference ontologies advances modularization and reusability of infectious disease ontologies within the IDO ecosystem. Future work will focus on further refining these ontologies, creating new extensions, and developing application ontologies based on them, in line with ongoing efforts to standardize biological and biomedical terminologies for improved data sharing, quality, and analysis.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"12"},"PeriodicalIF":2.0,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12239493/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144600470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0