Raniah Nur Hanami, Rahmad Mahendra, Alfan Farizki Wicaksono
{"title":"Semantic classification of Indonesian consumer health questions.","authors":"Raniah Nur Hanami, Rahmad Mahendra, Alfan Farizki Wicaksono","doi":"10.1186/s13326-025-00334-5","DOIUrl":"10.1186/s13326-025-00334-5","url":null,"abstract":"<p><strong>Purpose: </strong>Online consumer health forums serve as a way for the public to connect with medical professionals. While these medical forums offer a valuable service, online Question Answering (QA) forums can struggle to deliver timely answers due to the limited number of available healthcare professionals. One way to solve this problem is by developing an automatic QA system that can provide patients with quicker answers. One key component of such a system could be a module for classifying the semantic type of a question. This would allow the system to understand the patient's intent and route them towards the relevant information.</p><p><strong>Methods: </strong>This paper proposes a novel two-step approach to address the challenge of semantic type classification in Indonesian consumer health questions. We acknowledge the scarcity of Indonesian health domain data, a hurdle for machine learning models. To address this gap, we first introduce a novel corpus of annotated Indonesian consumer health questions. Second, we utilize this newly created corpus to build and evaluate a data-driven predictive model for classifying question semantic types. To enhance the trustworthiness and interpretability of the model's predictions, we employ an explainable model framework, LIME. This framework facilitates a deeper understanding of the role played by word-based features in the model's decision-making process. Additionally, it empowers us to conduct a comprehensive bias analysis, allowing for the detection of \"semantic bias\", where words with no inherent association with a specific semantic type disproportionately influence the model's predictions.</p><p><strong>Results: </strong>The annotation process revealed moderate agreement between expert annotators. In addition, not all words with high LIME probability could be considered true characteristics of a question type. This suggests a potential bias in the data used and the machine learning models themselves. Notably, XGBoost, Naïve Bayes, and MLP models exhibited a tendency to predict questions containing the words \"kanker\" (cancer) and \"depresi\" (depression) as belonging to the DIAGNOSIS category. In terms of prediction performance, Perceptron and XGBoost emerged as the top-performing models, achieving the highest weighted average F1 scores across all input scenarios and weighting factors. Naïve Bayes performed best after balancing the data with Borderline SMOTE, indicating its promise for handling imbalanced datasets.</p><p><strong>Conclusion: </strong>We constructed a corpus of query semantics in the domain of Indonesian consumer health, containing 964 questions annotated with their corresponding semantic types. This corpus served as the foundation for building a predictive model. We further investigated the impact of disease-biased words on model performance. These words exhibited high LIME scores, yet lacked association with a specific semantic type. We trained models using datasets with ","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"13"},"PeriodicalIF":2.0,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144731118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Beverley, Shane Babcock, Carter Benson, Giacomo De Colle, Sydney Cohen, Alexander D Diehl, Ram A N R Challa, Rachel A Mavrovich, Joshua Billig, Anthony Huffman, Yongqun He
{"title":"A fourfold pathogen reference ontology suite.","authors":"John Beverley, Shane Babcock, Carter Benson, Giacomo De Colle, Sydney Cohen, Alexander D Diehl, Ram A N R Challa, Rachel A Mavrovich, Joshua Billig, Anthony Huffman, Yongqun He","doi":"10.1186/s13326-025-00333-6","DOIUrl":"10.1186/s13326-025-00333-6","url":null,"abstract":"<p><strong>Background: </strong>Infectious diseases remain a critical global health challenge, and the integration of standardized ontologies plays a vital role in managing related data. The Infectious Disease Ontology (IDO) and its extensions, such as the Coronavirus Infectious Disease Ontology (CIDO), are essential for organizing and disseminating information related to infectious diseases. The COVID-19 pandemic highlighted the need for updating IDO and its virus-specific extensions. There is an additional need to update IDO extensions specific to bacteria, fungus, and parasite infectious diseases.</p><p><strong>Methods: </strong>The \"hub-and-spoke\" methodology is adopted to generate pathogen-specific extensions of IDO: Virus Infectious Disease Ontology (VIDO), Bacteria Infectious Disease Ontology (BIDO), Mycosis Infectious Disease Ontology (MIDO), and Parasite Infectious Disease Ontology (PIDO).</p><p><strong>Results: </strong>IDO is introduced before reporting on the scopes, major classes and relations, applications and extensions of IDO to VIDO, BIDO, MIDO, and PIDO.</p><p><strong>Conclusions: </strong>The creation of pathogen-specific reference ontologies advances modularization and reusability of infectious disease ontologies within the IDO ecosystem. Future work will focus on further refining these ontologies, creating new extensions, and developing application ontologies based on them, in line with ongoing efforts to standardize biological and biomedical terminologies for improved data sharing, quality, and analysis.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"12"},"PeriodicalIF":1.6,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12239493/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144600470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lizzy Farrugia, Lilian M Azzopardi, Jeremy Debattista, Charlie Abela
{"title":"medicX-KG: a knowledge graph for pharmacists' drug information needs.","authors":"Lizzy Farrugia, Lilian M Azzopardi, Jeremy Debattista, Charlie Abela","doi":"10.1186/s13326-025-00332-7","DOIUrl":"10.1186/s13326-025-00332-7","url":null,"abstract":"<p><p>The role of pharmacists is evolving from medicine dispensing to delivering comprehensive pharmaceutical services within multidisciplinary healthcare teams. Central to this shift is access to accurate, up-to-date medicinal product information supported by robust data integration. Leveraging artificial intelligence and semantic technologies, Knowledge Graphs (KGs) uncover hidden relationships and enable data-driven decision-making. This paper presents medicX-KG, a pharmacist-oriented knowledge graph supporting clinical and regulatory decisions. It forms the semantic layer of the broader medicX platform, powering predictive and explainable pharmacy services. medicX-KG integrates data from three sources, including, the British National Formulary (BNF), DrugBank, and the Malta Medicines Authority (MMA) that addresses Malta's regulatory landscape and combines European Medicines Agency alignment with partial UK supply dependence. The KG tackles the absence of a unified national drug repository, reducing pharmacists' reliance on fragmented sources. Its design was informed by interviews with practising pharmacists to ensure real-world applicability. We detail the KG's construction, including data extraction, ontology design, and semantic mapping. Evaluation demonstrates that medicX-KG effectively supports queries about drug availability, interactions, adverse reactions, and therapeutic classes. Limitations, including missing detailed dosage encoding and real-time updates, are discussed alongside directions for future enhancements.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"11"},"PeriodicalIF":1.6,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12211240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144540334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unveiling differential adverse event profiles in vaccines via LLM text embeddings and ontology semantic analysis.","authors":"Zhigang Wang, Xingxian Li, Jie Zheng, Yongqun He","doi":"10.1186/s13326-025-00331-8","DOIUrl":"10.1186/s13326-025-00331-8","url":null,"abstract":"<p><strong>Background: </strong>Vaccines are crucial for preventing infectious diseases; however, they may also be associated with adverse events (AEs). Conventional analysis of vaccine AEs relies on manual review and assignment of AEs to terms in terminology or ontology, which is a time-consuming process and constrained in scope. This study explores the potential of using Large Language Models (LLMs) and LLM text embeddings for efficient and comprehensive vaccine AE analysis.</p><p><strong>Results: </strong>We used Llama-3 LLM to extract AE information from FDA-approved vaccine package inserts for 111 licensed vaccines, including 15 influenza vaccines. Text embeddings were then generated for each vaccine's AEs using the nomic-embed-text and mxbai-embed-large models. Llama-3 achieved over 80% accuracy in extracting AE text from vaccine package inserts. To further evaluate the performance of text embedding, the vaccines were clustered using two clustering methods: (1) LLM text embedding-based clustering and (2) ontology-based semantic similarity analysis. The ontology-based method mapped AEs to the Human Phenotype Ontology (HPO) and Ontology of Adverse Events (OAE), with semantic similarity analyzed using Lin's method. Text embeddings were generated for each vaccine's AE description using the LLM nomic-embed-text and mxbai-embed-large models. Compared to the semantic similarity analysis, the LLM approach was able to capture more differential AE profiles. Furthermore, LLM-derived text embeddings were used to develop a Lasso logistic regression model to predict whether a vaccine is \"Live\" or \"Non-Live\". The term \"Non-Live\" refers to all vaccines that do not contain live organisms, including inactivated and mRNA vaccines. A comparative analysis showed that, despite similar clustering patterns, the nomic-embed-text model outperformed the other. It achieved 80.00% sensitivity, 83.06% specificity, and 81.89% accuracy in a 10-fold cross-validation. Many AE patterns, with examples demonstrated, were identified from our analysis with AE LLM embeddings.</p><p><strong>Conclusion: </strong>This study demonstrates the effectiveness of LLMs for automated AE extraction and analysis, and LLM text embeddings capture latent information about AEs, enabling more comprehensive knowledge discovery. Our findings suggest that LLMs demonstrate substantial potential for improving vaccine safety and public health research.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"10"},"PeriodicalIF":1.6,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12102970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144132354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vasundra Touré, Deepak Unni, Philip Krauss, Abdelhamid Abdelwahed, Jascha Buchhorn, Leon Hinderling, Thomas R Geiger, Sabine Österle
{"title":"The SPHN Schema Forge - transform healthcare semantics from human-readable to machine-readable by leveraging semantic web technologies.","authors":"Vasundra Touré, Deepak Unni, Philip Krauss, Abdelhamid Abdelwahed, Jascha Buchhorn, Leon Hinderling, Thomas R Geiger, Sabine Österle","doi":"10.1186/s13326-025-00330-9","DOIUrl":"https://doi.org/10.1186/s13326-025-00330-9","url":null,"abstract":"<p><strong>Background: </strong>The Swiss Personalized Health Network (SPHN) adopted the Resource Description Framework (RDF), a core component of the Semantic Web technology stack, for the formal encoding and exchange of healthcare data in a medical knowledge graph. The SPHN RDF Schema defines the semantics on how data elements should be represented. While RDF is proven to be machine readable and interpretable, it can be challenging for individuals without specialized background to read and understand the knowledge represented in RDF. For this reason, the semantics described in the SPHN RDF Schema are primarily defined in a user-accessible tabular format, the SPHN Dataset, before being translated into its RDF representation. However, this translation process was previously manual, time-consuming and labor-intensive.</p><p><strong>Result: </strong>To automate and streamline the translation from tabular to RDF representation, the SPHN Schema Forge web service was developed. With a few clicks, this tool automatically converts an SPHN-compliant Dataset spreadsheet into an RDF schema. Additionally, it generates SHACL rules for data validation, an HTML visualization of the schema and SPARQL queries for basic data analysis.</p><p><strong>Conclusion: </strong>The SPHN Schema Forge significantly reduces the manual effort and time required for schema generation, enabling researchers to focus on more meaningful tasks such as data interpretation and analysis within the SPHN framework.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"9"},"PeriodicalIF":1.6,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12063216/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144005244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning.","authors":"Tsaqif Naufal, Rahmad Mahendra, Alfan Farizki Wicaksono","doi":"10.1186/s13326-025-00329-2","DOIUrl":"https://doi.org/10.1186/s13326-025-00329-2","url":null,"abstract":"<p><strong>Purpose: </strong>Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information.</p><p><strong>Methods: </strong>This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures.</p><p><strong>Results: </strong>Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were <math><mrow><mn>88.61</mn> <mo>%</mo> <mo>,</mo> <mn>64.83</mn> <mo>%</mo></mrow> </math> , and <math><mrow><mn>35.01</mn> <mo>%</mo></mrow> </math> respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with <math><msub><mtext>IndoNLU</mtext> <mtext>LARGE</mtext></msub> </math> obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task.</p><p><strong>Conclusion: </strong>We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional informat","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"8"},"PeriodicalIF":1.6,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12057135/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144025207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julien Ehrsam, Christophe Gaudet-Blavignac, Mirjam Mattei, Monika Baumann, Christian Lovis
{"title":"Semantics in action: a guide for representing clinical data elements with SNOMED CT.","authors":"Julien Ehrsam, Christophe Gaudet-Blavignac, Mirjam Mattei, Monika Baumann, Christian Lovis","doi":"10.1186/s13326-025-00326-5","DOIUrl":"10.1186/s13326-025-00326-5","url":null,"abstract":"<p><strong>Background: </strong>Clinical data is abundant, but meaningful reuse remains lacking. Semantic representation using SNOMED CT can improve research, public health, and quality of care. However, the lack of applied guidelines to industrialise the process hinders sustainability and reproducibility. This work describes a guide for semantic representation of data elements with SNOMED CT, addressing challenges encountered during its application. The representation of the institutional data warehouse started with the guidelines proposed by SNOMED International and other groups. However, the application at large scale of manual expert-driven representation led to the development of additional rules.</p><p><strong>Results: </strong>An eight-rule step-by-step guide was developed iteratively through focus groups. Continuously refined by usage and growing coverage, they are tested in practice to ensure they achieve the desired outcome. All rules prioritize maintaining semantic accuracy, which is the main goal of our strategy. They are divided into four groups which apply to understanding the data correctly (Context), and to using SNOMED CT properly (Single concepts first, Approved post-coordination, Extending post-coordination).</p><p><strong>Conclusions: </strong>This work provides a practical framework for semantic representation using SNOMED CT, enabling greater accuracy and consistency by promoting a common method. While addressing challenges of large-scale implementation, the guide supports the drive from data centric models to a semantic centric approach, leveraging interoperability and more effective reuse of clinical data.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"7"},"PeriodicalIF":1.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948947/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143730232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Duesing, Jason Bennett, James A Overton, Randi Vita, Bjoern Peters
{"title":"Standardizing free-text data exemplified by two fields from the Immune Epitope Database.","authors":"Sebastian Duesing, Jason Bennett, James A Overton, Randi Vita, Bjoern Peters","doi":"10.1186/s13326-025-00324-7","DOIUrl":"10.1186/s13326-025-00324-7","url":null,"abstract":"<p><strong>Background: </strong>While unstructured data, such as free text, constitutes a large amount of publicly available biomedical data, it is underutilized in automated analyses due to the difficulty of extracting meaning from it. Normalizing free-text data, i.e., removing inessential variance, enables the use of structured vocabularies like ontologies to represent the data and allow for harmonized queries over it. This paper presents an adaptable tool for free-text normalization and an evaluation of the application of this tool to two different fields curated from the literature in the Immune Epitope Database (IEDB): \"age\" and \"data-location\" (the part of a paper in which data was found).</p><p><strong>Results: </strong>Free text entries for the database fields for subject age (4095 distinct values) and publication data-location (251,810 distinct values) in the IEDB were analyzed. Normalization was performed in three steps, namely character normalization, word normalization, and phrase normalization, using generalizable rules developed and applied with the tool presented in this manuscript. For the age dataset, in the character stage, the application of 21 rules resulted in 99.97% output validity; in the word stage, the application of 94 rules resulted in 98.06% output validity; and in the phrase stage, the application of 16 rules resulted in 83.81% output validity. For the data-location dataset, in the character stage, the application of 39 rules resulted in 99.99% output validity; in the word stage, the application of 187 rules resulted in 98.46% output validity; and in the phrase stage, the application of 12 rules resulted in 97.95% output validity.</p><p><strong>Conclusions: </strong>We developed a generalizable approach for normalization of free text as found in database fields with content on a specific topic. Creating and testing the rules took a one-time effort for a given field that can now be applied to data as it is being curated. The standardization achieved in two datasets tested produces significantly reduced variance in the content which enhances the findability and usability of that data, chiefly by improving search functionality and enabling linkages with formal ontologies.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"5"},"PeriodicalIF":1.6,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11929277/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143692223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital evolution: Novo Nordisk's shift to ontology-based data management.","authors":"Shawn Zheng Kai Tan, Shounak Baksi, Thomas Gade Bjerregaard, Preethi Elangovan, Thrishna Kuttikattu Gopalakrishnan, Darko Hric, Joffrey Joumaa, Beidi Li, Kashif Rabbani, Santhosh Kannan Venkatesan, Joshua Daniel Valdez, Saritha Vettikunnel Kuriakose","doi":"10.1186/s13326-025-00327-4","DOIUrl":"10.1186/s13326-025-00327-4","url":null,"abstract":"<p><p>The amount of biomedical data is growing, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transformation in Novo Nordisk Research & Early Development. Here, we include both our technical blueprint and our approach for organizational change management. We further discuss how such an OBDM ecosystem plays a pivotal role in the organization's digital aspirations for data federation and discovery fuelled by artificial intelligence. Our aim for this paper is to share the lessons learned in order to foster dialogue with parties navigating similar waters while collectively advancing the efforts in the fields of data management, semantics and data driven drug discovery.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"6"},"PeriodicalIF":1.6,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11929979/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143692220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New and revised gene ontology biological process terms describe multiorganism interactions critical for understanding microbial pathogenesis and sequences of concern.","authors":"Gene Godbold, Jody Proescher, Pascale Gaudet","doi":"10.1186/s13326-025-00323-8","DOIUrl":"10.1186/s13326-025-00323-8","url":null,"abstract":"<p><strong>Background: </strong>There is a new framework from the United States government for screening synthetic nucleic acids. Beginning in October of 2026, it calls for the screening of sequences 50 nucleotides or greater in length that are known to contribute to pathogenicity or toxicity for humans, regardless of the taxa from which it originates. Distinguishing sequences that encode pathogenic and toxic functions from those that lack them is not simple.</p><p><strong>Objectives: </strong>Our project scope was to discern, describe, and catalog sequences involved in microbial pathogenesis from the scientific literature. We recognize a need for better terminology to designate pathogenic functions that are relevant across the entire range of existing parasites.</p><p><strong>Methods: </strong>We canvassed publications investigating microbial pathogens of humans, other animals, and some plants to collect thousands of sequences that enable the exploitation of hosts. We compared sequences to each other, grouping them according to what host biological processes they subvert and the consequence(s) for the host. We developed terms to capture many of the varied pathogenic functions for sequences employed by parasitic microbes for host exploitation and applied these terms in a systematic manner to our dataset of sequences.</p><p><strong>Results/conclusions: </strong>The enhanced and expanded terms enable a quick and pertinent evaluation of a sequence's ability to endow a microbe with pathogenic function when they are appropriately applied to relevant sequences. This will allow providers of synthetic nucleic acids to rapidly assess sequences ordered by their customers for pathogenic capacity. This will help fulfill the new US government guidance.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"4"},"PeriodicalIF":1.6,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11927349/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143669953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}