Journal of Biomedical Semantics最新文献

筛选
英文 中文
Unveiling differential adverse event profiles in vaccines via LLM text embeddings and ontology semantic analysis. 通过LLM文本嵌入和本体语义分析揭示疫苗的不同不良事件概况。
IF 1.6 3区 工程技术
Journal of Biomedical Semantics Pub Date : 2025-05-23 DOI: 10.1186/s13326-025-00331-8
Zhigang Wang, Xingxian Li, Jie Zheng, Yongqun He
{"title":"Unveiling differential adverse event profiles in vaccines via LLM text embeddings and ontology semantic analysis.","authors":"Zhigang Wang, Xingxian Li, Jie Zheng, Yongqun He","doi":"10.1186/s13326-025-00331-8","DOIUrl":"10.1186/s13326-025-00331-8","url":null,"abstract":"<p><strong>Background: </strong>Vaccines are crucial for preventing infectious diseases; however, they may also be associated with adverse events (AEs). Conventional analysis of vaccine AEs relies on manual review and assignment of AEs to terms in terminology or ontology, which is a time-consuming process and constrained in scope. This study explores the potential of using Large Language Models (LLMs) and LLM text embeddings for efficient and comprehensive vaccine AE analysis.</p><p><strong>Results: </strong>We used Llama-3 LLM to extract AE information from FDA-approved vaccine package inserts for 111 licensed vaccines, including 15 influenza vaccines. Text embeddings were then generated for each vaccine's AEs using the nomic-embed-text and mxbai-embed-large models. Llama-3 achieved over 80% accuracy in extracting AE text from vaccine package inserts. To further evaluate the performance of text embedding, the vaccines were clustered using two clustering methods: (1) LLM text embedding-based clustering and (2) ontology-based semantic similarity analysis. The ontology-based method mapped AEs to the Human Phenotype Ontology (HPO) and Ontology of Adverse Events (OAE), with semantic similarity analyzed using Lin's method. Text embeddings were generated for each vaccine's AE description using the LLM nomic-embed-text and mxbai-embed-large models. Compared to the semantic similarity analysis, the LLM approach was able to capture more differential AE profiles. Furthermore, LLM-derived text embeddings were used to develop a Lasso logistic regression model to predict whether a vaccine is \"Live\" or \"Non-Live\". The term \"Non-Live\" refers to all vaccines that do not contain live organisms, including inactivated and mRNA vaccines. A comparative analysis showed that, despite similar clustering patterns, the nomic-embed-text model outperformed the other. It achieved 80.00% sensitivity, 83.06% specificity, and 81.89% accuracy in a 10-fold cross-validation. Many AE patterns, with examples demonstrated, were identified from our analysis with AE LLM embeddings.</p><p><strong>Conclusion: </strong>This study demonstrates the effectiveness of LLMs for automated AE extraction and analysis, and LLM text embeddings capture latent information about AEs, enabling more comprehensive knowledge discovery. Our findings suggest that LLMs demonstrate substantial potential for improving vaccine safety and public health research.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"10"},"PeriodicalIF":1.6,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12102970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144132354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The SPHN Schema Forge - transform healthcare semantics from human-readable to machine-readable by leveraging semantic web technologies. SPHN Schema Forge——利用语义web技术将医疗保健语义从人类可读转换为机器可读。
IF 1.6 3区 工程技术
Journal of Biomedical Semantics Pub Date : 2025-05-08 DOI: 10.1186/s13326-025-00330-9
Vasundra Touré, Deepak Unni, Philip Krauss, Abdelhamid Abdelwahed, Jascha Buchhorn, Leon Hinderling, Thomas R Geiger, Sabine Österle
{"title":"The SPHN Schema Forge - transform healthcare semantics from human-readable to machine-readable by leveraging semantic web technologies.","authors":"Vasundra Touré, Deepak Unni, Philip Krauss, Abdelhamid Abdelwahed, Jascha Buchhorn, Leon Hinderling, Thomas R Geiger, Sabine Österle","doi":"10.1186/s13326-025-00330-9","DOIUrl":"https://doi.org/10.1186/s13326-025-00330-9","url":null,"abstract":"<p><strong>Background: </strong>The Swiss Personalized Health Network (SPHN) adopted the Resource Description Framework (RDF), a core component of the Semantic Web technology stack, for the formal encoding and exchange of healthcare data in a medical knowledge graph. The SPHN RDF Schema defines the semantics on how data elements should be represented. While RDF is proven to be machine readable and interpretable, it can be challenging for individuals without specialized background to read and understand the knowledge represented in RDF. For this reason, the semantics described in the SPHN RDF Schema are primarily defined in a user-accessible tabular format, the SPHN Dataset, before being translated into its RDF representation. However, this translation process was previously manual, time-consuming and labor-intensive.</p><p><strong>Result: </strong>To automate and streamline the translation from tabular to RDF representation, the SPHN Schema Forge web service was developed. With a few clicks, this tool automatically converts an SPHN-compliant Dataset spreadsheet into an RDF schema. Additionally, it generates SHACL rules for data validation, an HTML visualization of the schema and SPARQL queries for basic data analysis.</p><p><strong>Conclusion: </strong>The SPHN Schema Forge significantly reduces the manual effort and time required for schema generation, enabling researchers to focus on more meaningful tasks such as data interpretation and analysis within the SPHN framework.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"9"},"PeriodicalIF":1.6,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12063216/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144005244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning. 使用多任务学习从消费者健康论坛中提取句子、实体和关键短语。
IF 1.6 3区 工程技术
Journal of Biomedical Semantics Pub Date : 2025-05-06 DOI: 10.1186/s13326-025-00329-2
Tsaqif Naufal, Rahmad Mahendra, Alfan Farizki Wicaksono
{"title":"Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning.","authors":"Tsaqif Naufal, Rahmad Mahendra, Alfan Farizki Wicaksono","doi":"10.1186/s13326-025-00329-2","DOIUrl":"https://doi.org/10.1186/s13326-025-00329-2","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Purpose: &lt;/strong&gt;Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were &lt;math&gt;&lt;mrow&gt;&lt;mn&gt;88.61&lt;/mn&gt; &lt;mo&gt;%&lt;/mo&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mn&gt;64.83&lt;/mn&gt; &lt;mo&gt;%&lt;/mo&gt;&lt;/mrow&gt; &lt;/math&gt; , and &lt;math&gt;&lt;mrow&gt;&lt;mn&gt;35.01&lt;/mn&gt; &lt;mo&gt;%&lt;/mo&gt;&lt;/mrow&gt; &lt;/math&gt; respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with &lt;math&gt;&lt;msub&gt;&lt;mtext&gt;IndoNLU&lt;/mtext&gt; &lt;mtext&gt;LARGE&lt;/mtext&gt;&lt;/msub&gt; &lt;/math&gt; obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusion: &lt;/strong&gt;We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional informat","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"8"},"PeriodicalIF":1.6,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12057135/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144025207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantics in action: a guide for representing clinical data elements with SNOMED CT. 语义的作用:用SNOMED CT表示临床数据元素的指南。
IF 1.6 3区 工程技术
Journal of Biomedical Semantics Pub Date : 2025-03-27 DOI: 10.1186/s13326-025-00326-5
Julien Ehrsam, Christophe Gaudet-Blavignac, Mirjam Mattei, Monika Baumann, Christian Lovis
{"title":"Semantics in action: a guide for representing clinical data elements with SNOMED CT.","authors":"Julien Ehrsam, Christophe Gaudet-Blavignac, Mirjam Mattei, Monika Baumann, Christian Lovis","doi":"10.1186/s13326-025-00326-5","DOIUrl":"10.1186/s13326-025-00326-5","url":null,"abstract":"<p><strong>Background: </strong>Clinical data is abundant, but meaningful reuse remains lacking. Semantic representation using SNOMED CT can improve research, public health, and quality of care. However, the lack of applied guidelines to industrialise the process hinders sustainability and reproducibility. This work describes a guide for semantic representation of data elements with SNOMED CT, addressing challenges encountered during its application. The representation of the institutional data warehouse started with the guidelines proposed by SNOMED International and other groups. However, the application at large scale of manual expert-driven representation led to the development of additional rules.</p><p><strong>Results: </strong>An eight-rule step-by-step guide was developed iteratively through focus groups. Continuously refined by usage and growing coverage, they are tested in practice to ensure they achieve the desired outcome. All rules prioritize maintaining semantic accuracy, which is the main goal of our strategy. They are divided into four groups which apply to understanding the data correctly (Context), and to using SNOMED CT properly (Single concepts first, Approved post-coordination, Extending post-coordination).</p><p><strong>Conclusions: </strong>This work provides a practical framework for semantic representation using SNOMED CT, enabling greater accuracy and consistency by promoting a common method. While addressing challenges of large-scale implementation, the guide supports the drive from data centric models to a semantic centric approach, leveraging interoperability and more effective reuse of clinical data.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"7"},"PeriodicalIF":1.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948947/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143730232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Standardizing free-text data exemplified by two fields from the Immune Epitope Database. 标准化自由文本数据,以免疫表位数据库中的两个字段为例。
IF 1.6 3区 工程技术
Journal of Biomedical Semantics Pub Date : 2025-03-22 DOI: 10.1186/s13326-025-00324-7
Sebastian Duesing, Jason Bennett, James A Overton, Randi Vita, Bjoern Peters
{"title":"Standardizing free-text data exemplified by two fields from the Immune Epitope Database.","authors":"Sebastian Duesing, Jason Bennett, James A Overton, Randi Vita, Bjoern Peters","doi":"10.1186/s13326-025-00324-7","DOIUrl":"10.1186/s13326-025-00324-7","url":null,"abstract":"<p><strong>Background: </strong>While unstructured data, such as free text, constitutes a large amount of publicly available biomedical data, it is underutilized in automated analyses due to the difficulty of extracting meaning from it. Normalizing free-text data, i.e., removing inessential variance, enables the use of structured vocabularies like ontologies to represent the data and allow for harmonized queries over it. This paper presents an adaptable tool for free-text normalization and an evaluation of the application of this tool to two different fields curated from the literature in the Immune Epitope Database (IEDB): \"age\" and \"data-location\" (the part of a paper in which data was found).</p><p><strong>Results: </strong>Free text entries for the database fields for subject age (4095 distinct values) and publication data-location (251,810 distinct values) in the IEDB were analyzed. Normalization was performed in three steps, namely character normalization, word normalization, and phrase normalization, using generalizable rules developed and applied with the tool presented in this manuscript. For the age dataset, in the character stage, the application of 21 rules resulted in 99.97% output validity; in the word stage, the application of 94 rules resulted in 98.06% output validity; and in the phrase stage, the application of 16 rules resulted in 83.81% output validity. For the data-location dataset, in the character stage, the application of 39 rules resulted in 99.99% output validity; in the word stage, the application of 187 rules resulted in 98.46% output validity; and in the phrase stage, the application of 12 rules resulted in 97.95% output validity.</p><p><strong>Conclusions: </strong>We developed a generalizable approach for normalization of free text as found in database fields with content on a specific topic. Creating and testing the rules took a one-time effort for a given field that can now be applied to data as it is being curated. The standardization achieved in two datasets tested produces significantly reduced variance in the content which enhances the findability and usability of that data, chiefly by improving search functionality and enabling linkages with formal ontologies.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"5"},"PeriodicalIF":1.6,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11929277/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143692223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital evolution: Novo Nordisk's shift to ontology-based data management. 数字化演进:诺和诺德向基于本体的数据管理的转变。
IF 1.6 3区 工程技术
Journal of Biomedical Semantics Pub Date : 2025-03-22 DOI: 10.1186/s13326-025-00327-4
Shawn Zheng Kai Tan, Shounak Baksi, Thomas Gade Bjerregaard, Preethi Elangovan, Thrishna Kuttikattu Gopalakrishnan, Darko Hric, Joffrey Joumaa, Beidi Li, Kashif Rabbani, Santhosh Kannan Venkatesan, Joshua Daniel Valdez, Saritha Vettikunnel Kuriakose
{"title":"Digital evolution: Novo Nordisk's shift to ontology-based data management.","authors":"Shawn Zheng Kai Tan, Shounak Baksi, Thomas Gade Bjerregaard, Preethi Elangovan, Thrishna Kuttikattu Gopalakrishnan, Darko Hric, Joffrey Joumaa, Beidi Li, Kashif Rabbani, Santhosh Kannan Venkatesan, Joshua Daniel Valdez, Saritha Vettikunnel Kuriakose","doi":"10.1186/s13326-025-00327-4","DOIUrl":"10.1186/s13326-025-00327-4","url":null,"abstract":"<p><p>The amount of biomedical data is growing, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transformation in Novo Nordisk Research & Early Development. Here, we include both our technical blueprint and our approach for organizational change management. We further discuss how such an OBDM ecosystem plays a pivotal role in the organization's digital aspirations for data federation and discovery fuelled by artificial intelligence. Our aim for this paper is to share the lessons learned in order to foster dialogue with parties navigating similar waters while collectively advancing the efforts in the fields of data management, semantics and data driven drug discovery.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"6"},"PeriodicalIF":1.6,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11929979/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143692220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New and revised gene ontology biological process terms describe multiorganism interactions critical for understanding microbial pathogenesis and sequences of concern. 新的和修订的基因本体生物学过程术语描述了对理解微生物发病机制和相关序列至关重要的多生物相互作用。
IF 1.6 3区 工程技术
Journal of Biomedical Semantics Pub Date : 2025-03-21 DOI: 10.1186/s13326-025-00323-8
Gene Godbold, Jody Proescher, Pascale Gaudet
{"title":"New and revised gene ontology biological process terms describe multiorganism interactions critical for understanding microbial pathogenesis and sequences of concern.","authors":"Gene Godbold, Jody Proescher, Pascale Gaudet","doi":"10.1186/s13326-025-00323-8","DOIUrl":"10.1186/s13326-025-00323-8","url":null,"abstract":"<p><strong>Background: </strong>There is a new framework from the United States government for screening synthetic nucleic acids. Beginning in October of 2026, it calls for the screening of sequences 50 nucleotides or greater in length that are known to contribute to pathogenicity or toxicity for humans, regardless of the taxa from which it originates. Distinguishing sequences that encode pathogenic and toxic functions from those that lack them is not simple.</p><p><strong>Objectives: </strong>Our project scope was to discern, describe, and catalog sequences involved in microbial pathogenesis from the scientific literature. We recognize a need for better terminology to designate pathogenic functions that are relevant across the entire range of existing parasites.</p><p><strong>Methods: </strong>We canvassed publications investigating microbial pathogens of humans, other animals, and some plants to collect thousands of sequences that enable the exploitation of hosts. We compared sequences to each other, grouping them according to what host biological processes they subvert and the consequence(s) for the host. We developed terms to capture many of the varied pathogenic functions for sequences employed by parasitic microbes for host exploitation and applied these terms in a systematic manner to our dataset of sequences.</p><p><strong>Results/conclusions: </strong>The enhanced and expanded terms enable a quick and pertinent evaluation of a sequence's ability to endow a microbe with pathogenic function when they are appropriately applied to relevant sequences. This will allow providers of synthetic nucleic acids to rapidly assess sequences ordered by their customers for pathogenic capacity. This will help fulfill the new US government guidance.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"4"},"PeriodicalIF":1.6,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11927349/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143669953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer's disease. 生物学领域丰富的知识表示:阿尔茨海默病基于文献发现的案例研究。
IF 1.6 3区 工程技术
Journal of Biomedical Semantics Pub Date : 2025-03-20 DOI: 10.1186/s13326-025-00328-3
Yiyuan Pu, Daniel Beck, Karin Verspoor
{"title":"Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer's disease.","authors":"Yiyuan Pu, Daniel Beck, Karin Verspoor","doi":"10.1186/s13326-025-00328-3","DOIUrl":"10.1186/s13326-025-00328-3","url":null,"abstract":"<p><strong>Background: </strong>In Literature-based Discovery (LBD), Swanson's original ABC model brought together isolated public knowledge statements and assembled them to infer putative hypotheses via logical connections. Modern LBD studies that scale up this approach through automation typically rely on a simple entity-based knowledge graph with co-occurrences and/or semantic triples as basic building blocks. However, our analysis of a knowledge graph constructed for a recent LBD system reveals limitations arising from such pairwise representations, which further negatively impact knowledge inference. Using LBD as the context and motivation in this work, we explore limitations of using pairwise relationships only as knowledge representation in knowledge graphs, and we identify impacts of these limitations on knowledge inference. We argue that enhanced knowledge representation is beneficial for biological knowledge representation in general, as well as for both the quality and the specificity of hypotheses proposed with LBD.</p><p><strong>Results: </strong>Based on a systematic analysis of one co-occurrence-based LBD system focusing on Alzheimer's Disease, we identify 7 types of limitations arising from the exclusive use of pairwise relationships in a standard knowledge graph-including the need to capture more than two entities interacting together in a single event-and 3 types of negative impacts on knowledge inferred with the graph-Experimentally infeasible hypotheses, Literature-inconsistent hypotheses, and Oversimplified hypotheses explanations. We also present an indicative distribution of different types of relationships. Pairwise relationships are an essential component in representation frameworks for knowledge discovery. However, only 20% of discoveries are perfectly represented with pairwise relationships alone. 73% require a combination of pairwise relationships and nested relationships. The remaining 7% are represented with pairwise relationships, nested relationships, and hypergraphs.</p><p><strong>Conclusion: </strong>We argue that the standard entity pair-based knowledge graph, while essential for representing basic binary relations, results in important limitations for comprehensive biological knowledge representation and impacts downstream tasks such as proposing meaningful discoveries in LBD. These limitations can be mitigated by integrating more semantically complex knowledge representation strategies, including capturing collective interactions and allowing for nested entities. The use of more sophisticated knowledge representation will benefit biological fields with more expressive knowledge graphs. Downstream tasks, such as LBD, can benefit from richer representations as well, allowing for generation of implicit knowledge discoveries and explanations for disease diagnosis, treatment, and mechanism that are more biologically meaningful.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"3"},"PeriodicalIF":1.6,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11924609/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143669945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gene expression knowledge graph for patient representation and diabetes prediction. 用于患者表征和糖尿病预测的基因表达知识图谱。
IF 1.6 3区 工程技术
Journal of Biomedical Semantics Pub Date : 2025-03-08 DOI: 10.1186/s13326-025-00325-6
Rita T Sousa, Heiko Paulheim
{"title":"Gene expression knowledge graph for patient representation and diabetes prediction.","authors":"Rita T Sousa, Heiko Paulheim","doi":"10.1186/s13326-025-00325-6","DOIUrl":"10.1186/s13326-025-00325-6","url":null,"abstract":"<p><p>Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the number of patients in expression datasets is usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration, and to learn uniform patient representations for subjects contained in different incompatible datasets. Different strategies and KG embedding methods are explored to generate vector representations, serving as inputs for a classifier. Extensive experiments demonstrate the efficacy of our approach, revealing weighted F1-score improvements in diabetes prediction up to 13% when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"2"},"PeriodicalIF":1.6,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143585774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expanding the concept of ID conversion in TogoID by introducing multi-semantic and label features. 通过引入多语义和标签特征,扩展了TogoID中ID转换的概念。
IF 1.6 3区 工程技术
Journal of Biomedical Semantics Pub Date : 2025-01-08 DOI: 10.1186/s13326-024-00322-1
Shuya Ikeda, Kiyoko F Aoki-Kinoshita, Hirokazu Chiba, Susumu Goto, Masae Hosoda, Shuichi Kawashima, Jin-Dong Kim, Yuki Moriya, Tazro Ohta, Hiromasa Ono, Terue Takatsuki, Yasunori Yamamoto, Toshiaki Katayama
{"title":"Expanding the concept of ID conversion in TogoID by introducing multi-semantic and label features.","authors":"Shuya Ikeda, Kiyoko F Aoki-Kinoshita, Hirokazu Chiba, Susumu Goto, Masae Hosoda, Shuichi Kawashima, Jin-Dong Kim, Yuki Moriya, Tazro Ohta, Hiromasa Ono, Terue Takatsuki, Yasunori Yamamoto, Toshiaki Katayama","doi":"10.1186/s13326-024-00322-1","DOIUrl":"10.1186/s13326-024-00322-1","url":null,"abstract":"<p><strong>Background: </strong>TogoID ( https://togoid.dbcls.jp/ ) is an identifier (ID) conversion service designed to link IDs across diverse categories of life science databases. With its ability to obtain IDs related in different semantic relationships, a user-friendly web interface, and a regular automatic data update system, TogoID has been a valuable tool for bioinformatics.</p><p><strong>Results: </strong>We have recently expanded TogoID's ability to represent semantics between datasets, enabling it to handle multiple semantic relationships within dataset pairs. This enhancement enables TogoID to distinguish relationships such as \"glycans bind to proteins\" or \"glycans are processed by proteins\" between glycans and proteins. Additional new features include the ability to display labels corresponding to database IDs, making it easier to interpret the relationships between the various IDs available in TogoID, and the ability to convert labels to IDs, extending the entry point for ID conversion. The implementation of URL parameters, which reproduces the state of TogoID's web application, allows users to share complex search results through a simple URL.</p><p><strong>Conclusions: </strong>These advancements improve TogoID's utility in bioinformatics, allowing researchers to explore complex ID relationships. By introducing the tool's multi-semantic and label features, TogoID expands the concept of ID conversion and supports more comprehensive and efficient data integration across life science databases.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"1"},"PeriodicalIF":1.6,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11708180/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142948850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信