B Damayanthi Jesudas, Sam Smith, Feng-Yu Yeh, Jie Zheng, John Beverley, William D Duncan, Yongqun He
{"title":"BERTopic-driven term extraction from biomedical texts toward ontology population: evaluating vaccine ontology with Plotkin's vaccines corpus.","authors":"B Damayanthi Jesudas, Sam Smith, Feng-Yu Yeh, Jie Zheng, John Beverley, William D Duncan, Yongqun He","doi":"10.1186/s13326-026-00353-w","DOIUrl":"https://doi.org/10.1186/s13326-026-00353-w","url":null,"abstract":"<p><strong>Background: </strong>Ontologies are essential for structuring biomedical knowledge, supporting semantic integration, reasoning, and data interoperability. In vaccinology, ontology population is particularly critical, as vaccines span diverse domains. A well-defined Vaccine Ontology (VO) enables consistent knowledge representation, integration across datasets, and supports applications such as decision support, literature mining, and semantic search. However, manual ontology population is tedious, time-consuming, and difficult to maintain in this dynamically evolving domain, underscoring the need for automated or semi-automated population approaches.</p><p><strong>Methods: </strong>We present a semi-automated pipeline that uses Bidirectional Encoder Representations from Transformers and Topic Modeling (BERTopic) to extract ontology-relevant concepts from biomedical text. To evaluate the effectiveness of this automated approach, the method is applied to Plotkin's Vaccines corpus, a leading reference text in vaccinology that synthesizes scientific, clinical, and policy perspectives on vaccines. The workflow integrates multiple natural language processing (NLP) components: document preprocessing with spaCy part-of-speech tagging and vectorization, sentence embeddings generated by a lightweight transformer model (all-MiniLM-L6-v2), dimensionality reduction with Uniform Manifold Approximation and Projection (UMAP), clustering with Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), and topic representation via Class-based Term Frequency - Inverse Document Frequency (c-TF-IDF). To guide topic discovery toward vaccine-relevant concepts and filter irrelevant terms, the pipeline incorporates a curated set of vaccine-focused terms derived from an existing vaccine ontology as seed words to influence topic representations, while preserving the unsupervised nature of the clustering process. To enhance interpretability, the pipeline employs Keyword extraction using BERT embeddings (KeyBERT) for automatic keyword-based labeling, supplemented with disambiguated descriptive labels, and Bidirectional and Auto-Regressive Transformer (BART) summarization for topic-level summaries. The resulting hierarchical topic structures are further refined through a tree-merging module that unifies multiple topic hierarchies into a coherent ontology-like representation. The extracted topics are reviewed by the Subject Matter Experts (SMEs) to filter irrelevant terms and then mapped to Vaccine Ontology, a well-established ontology to assess their relevance and coverage, demonstrating how automated methods can reduce the labor-intensive effort required for manual ontology population.</p><p><strong>Results: </strong>The script is customized to generate a varying number of topics and keywords. In this study, the top 50 topics with 10 keywords per topic were extracted for each chapter of Plotkin's vaccines. The pipeline produced coherent topi","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147815427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William D Duncan, Amarpreet Sabharwal, Alexander D Diehl, Nivedita Dutta, Matthew Diller, Marcin P Joachimiak, Gopikrishnan M Chandrasekharan
{"title":"Representing dental caries and dysbiosis within the oral microbiome in the Oral Health and Disease Ontology.","authors":"William D Duncan, Amarpreet Sabharwal, Alexander D Diehl, Nivedita Dutta, Matthew Diller, Marcin P Joachimiak, Gopikrishnan M Chandrasekharan","doi":"10.1186/s13326-026-00350-z","DOIUrl":"https://doi.org/10.1186/s13326-026-00350-z","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"17 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13109892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147772157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ClarID: a human-readable and compact identifier specification for biomedical metadata integration.","authors":"Manuel Rueda, Ivo G Gut","doi":"10.1186/s13326-026-00349-6","DOIUrl":"10.1186/s13326-026-00349-6","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"17 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13123180/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147772217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nivedita Dutta, Michael DeBellis, Nripen Chanda, Alexander D Diehl, Finn Wilson, Mateus Rocha, Mattew Diller, Gopikrishnan M Chandrasekharan, William D Duncan
{"title":"Representing dental restoration materials in the oral health and disease ontology.","authors":"Nivedita Dutta, Michael DeBellis, Nripen Chanda, Alexander D Diehl, Finn Wilson, Mateus Rocha, Mattew Diller, Gopikrishnan M Chandrasekharan, William D Duncan","doi":"10.1186/s13326-026-00352-x","DOIUrl":"10.1186/s13326-026-00352-x","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"17 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13088783/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147698887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge graph embedding and alignment of incomplete electronic health Records for critical care applications.","authors":"Shervin Mehryar, Michel Dumontier","doi":"10.1186/s13326-026-00351-y","DOIUrl":"https://doi.org/10.1186/s13326-026-00351-y","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147673612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joan Glenny-Pescov, Caty Chung, Nicolette Ross, Jiaming Hu, Michael Sinclair, Rabia Khurshid, Anneli Karlsson, Stephan C Schürer
{"title":"Advancing the bioassay ontology through integrated PK/PD and safety pharmacology representation.","authors":"Joan Glenny-Pescov, Caty Chung, Nicolette Ross, Jiaming Hu, Michael Sinclair, Rabia Khurshid, Anneli Karlsson, Stephan C Schürer","doi":"10.1186/s13326-025-00342-5","DOIUrl":"10.1186/s13326-025-00342-5","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"17 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12983555/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147443826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A practical and nuanced framework for entity linking evaluation.","authors":"Fuqi Xu, Goran Nenadic, Robert Stevens","doi":"10.1186/s13326-025-00339-0","DOIUrl":"10.1186/s13326-025-00339-0","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12967008/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146180025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clifford Chen, Muhammad Amith, Kirk Roberts, Rebecca Mauldin, Renata Komalasari, Cui Tao
{"title":"An application-based ontological knowledge base of medications to support health literacy and adherence for the consumer population: an aging population use case.","authors":"Clifford Chen, Muhammad Amith, Kirk Roberts, Rebecca Mauldin, Renata Komalasari, Cui Tao","doi":"10.1186/s13326-026-00347-8","DOIUrl":"10.1186/s13326-026-00347-8","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12924406/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146085878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ontology development and use for cholangiocarcinoma risk factors and predictions: a term enrichment data analysis and machine learning classification.","authors":"Anuwat Pengput, Alexander D Diehl","doi":"10.1186/s13326-025-00345-2","DOIUrl":"10.1186/s13326-025-00345-2","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"17 1","pages":"2"},"PeriodicalIF":2.0,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12829242/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146029611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ECLed- a tool supporting the effective use of the SNOMED CT Expression Constraint Language.","authors":"Tessa Ohlsen, André Sander, Josef Ingenerf","doi":"10.1186/s13326-025-00344-3","DOIUrl":"10.1186/s13326-025-00344-3","url":null,"abstract":"<p><strong>Background: </strong>The Expression Constraint Language (ECL) is a powerful query language for SNOMED CT, enabling precise semantic queries across clinical concepts. However, its complex syntax and reliance on the SNOMED CT Concept Model make it difficult for non-experts to use, limiting its broader adoption in clinical research and healthcare analytics.</p><p><strong>Objective: </strong>This work presents ECLed, a web-based tool designed to simplify access to ECL queries by abstracting the complexity of ECL syntax and the SNOMED CT Concept Model. ECLed is aimed at non-technical users, enabling the creation and modification of ECL queries and facilitating the querying of patient data coded with SNOMED CT.</p><p><strong>Methods: </strong>ECLed was developed following a detailed requirements analysis, addressing both functional and non-functional needs. The tool supports the creation and editing of SNOMED CT ECL queries, integrates a processed Concept Model, and uses FHIR terminology services for semantic validation. Its modular architecture, with a frontend based on Angular and a backend on Spring Boot, ensures seamless communication through RESTful interfaces.</p><p><strong>Result: </strong>ECLed demonstrated high usability in a user survey. Technical validation confirmed that it reliably generates and edits complex ECL queries. The tool was successfully integrated into the DaWiMed research platform, enhancing clinical analysis workflows. It also worked effectively with clinical data in FHIR format, although scalability with larger datasets remains to be tested.</p><p><strong>Discussion: </strong>ECLed overcomes the limitations of existing ECL tools by abstracting the complexity of both the syntax and the SNOMED CT Concept Model. It provides a user-friendly solution that enables both technical and non-technical users to easily create and edit ECL queries.</p><p><strong>Conclusion: </strong>ECLed offers a practical, user-friendly solution for creating SNOMED CT ECL queries, effectively hiding the underlying complexity while optimizing clinical research and data analysis workflows. It holds significant potential for further development and integration into additional research platforms.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"17 1","pages":"1"},"PeriodicalIF":2.0,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145911535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}